2015 Predictions Reveal a More Pragmatic Approach to Big Data
2015 will be the year of pragmatism as global enterprises turn their focus on getting value from these investments.
1. A more pragmatic approach to big data will prevail.
2. Good old-fashioned MapReduce will dominate production.
3. Next year will see the resurrection and rise of the enterprise Java developer.
4. The big data “thang” will continue to remain convoluted.
5. 2015 will be a “show me the money” year when it comes to Hadoop.
6. Elephants will fly – Hadoop will make a push to a worldwide phenomenon.

Parallelism via “parSapply”
In an earlier post, I used mclapply to kick off parallel R processes and to demonstrate inter-process synchronization via the flock package. Although I have been using this approach to parallelism for a few years now, I admit, it has certain important disadvantages. It works only on a single machine, and also, it doesn’t work on Windows.

Notes on Shrinkage & Prediction in Hierarchical Models
Ecologists increasingly use mixed effects models, where some intercepts or slopes are fixed, and others are random (or varying). Often, confusion exists around whether and when to use fixed vs. random intercepts/slopes, which is understandable given their multiple definitions.

UEFA Champions League Round of 16 draw
Each year after the group stage, there is the much awaited drawing of the eighth-final, which essentially defines a team’s fate. So far the thing is not too complicated, as there are 16 teams out of which we need to generate 8 games – no problem if it would be possible to draw the teams without restrictions. But there are quite a few:
1.Group winners only play group runner up
2.You can’t play a team which was in the same group
3.Teams from the same league can’t play each others
Thus there is some combinatorics to solve. Sebastian created a shiny app and the necessary R-Code to generate the probabilities of who plays whom:

Alfred Hitchcock and a Classic Forecasting Scam
After six correct picks in a row, this savant has established his ability, wouldn’t you think? And he certainly deserves a cut of earnings on the wagers you’ve started placing on these forecasts, as well as a healthy gratuity for the forthcoming 7th prediction, a stock pick which is promised to make you rich.

Next-Gen Business Analytics Paving the Way to Success in 2015
Business analytics give arrangements which help to settle on key choice and business strategies by gathering expansive data and information. You would find that it does have not simple but complex data like profits, losses, transactions, marketing return, customer feedback and so forth. Normally business analytics programming is utilized to create these sorts of information. This is not another term; however it has ended up being more exact and organized with time. Individuals frequently require a legitimate structure to assess the gigantic measure of data and information accessible.

QQ-plots in R vs. SPSS – A look at the differences
We teach two software packages, R and SPSS, in Quantitative Methods 101 for psychology freshman at Bremen University (Germany). Sometimes confusion arises, when the software packages produce different results. This may be due to specifics in the implemention of a method or, as in most cases, to different default settings. One of these situations occurs when the QQ-plot is introduced. Below we see two QQ-plots, produced by SPSS and R, respectively.

Experient downdating algorithm for Leave-One-Out CV in RDA
In this post, I want to demonstrate a piece of experiment code for downdating algorithm for Leave-One-Out (LOO) Cross Validation in Regularized Discriminant Analysis.

Hierarchical Clustering with R (feat. D3.js and Shiny)
Agglomerative hierarchical clustering is a simple, intuitive and well-understood method for clustering data points. I used it with good results in a project to estimate the true geographical position of objects based on measured estimates. With this tutorial I would like to describe the basics of this method, how to implement it in R with hclust and some ideas on how to decide where to cut the tree. This was also a great opportunity for composing anohter Shiny/D3.js app (GitHub, shinyapps.io) – something I wanted to do for a while now. At the end of the text I am writing a bit about what I learned in that regard.

Big Data Tool Analyzes Intentions: Cool Or Creepy?
Lexalytics intention analysis tool determines what you’re going to do before you do it, the company says. This goes beyond sentiment analysis.

Decide which frequent flyer program is best for your city
We’ve posted in the past about which airline you should be loyal to, but we always felt guilty because we only showed results for the New York metro area and there are Decision Science News readers all over the USA (and all over the World too, though in most countries it’s an easy decision: go with the national airline).
Since then, we’ve learned about data for every flight in the USA that makes it pretty straightforward to generate for every US metro area the number of departures for each airline.

Most Demanded Data Science and Data Mining Skills
Our analysis of most demanded data scientist skills shows that Data Science is a team effort focused on business analytics, with top 5 platform skills being SQL, Python, R, SAS, and Hadoop.

Open Innovation in the Age of Big Data
Open platforms for data analysis are more important than ever with big data and increasing access to heterogeneous data sources for analysts.

Cartography with complex survey data
Visualizing complex survey data is something of an art. If the data has been collected and aggregated to geographic units (say, counties or states), a choropleth is one option. But if the data aren’t so neatly arranged, making visual sense often requires some form of smoothing to represent it on a map.

Does the Future Lie with Embedded BI?
In the near future we might see BI take another direction: Rather than companies merely purchasing dashboard reporting software for the purposes of internal usage, we’ll be seeing a surge in companies looking to integrate advanced analytics and reporting into their own products. Welcome to the world of embedded analytics.

Contextualizing Big Data For The Everyday Business
As data is growing at a monstrous speed, businesses are left grappling on how to curate data or how to even make sense of it. According to a Gartner report, through 2015, 85% of the Fortune 500 companies will fail to exploit big data for competitive advantage. As extraction of data becomes difficult, organizations will have to develop resources and strategies to tap into the unlimited potential of big data or else they can never really gain from the info-thick big data that holds the golden key to future success.

IBM Watson Analytics vs. Microsoft Azure Machine Learning (Part 1)
IBM Watson Analytics prototype seeks to abstract away data science, taking ordinary natural language queries and answering them based on the content of uploaded datasets. Microsoft Azure Machine Learning goes the opposite route, streamlining existing data mining methodology for fast results and integration with MS’s other cloud services.

What every machine learning package can learn from Vowpal Wabbit
Vowpal Wabbit (VW) is one of the overlooked gems of machine learning. The open source brainchild of John Langford and his collaborators at Yahoo and Microsoft Research, VW can teach us a lot about modern, scalable learning.

Python: Exploring Seaborn and Pandas based plot types in HoloViews
In this notebook we’ll look at interfacing between the composability and ability to generate complex visualizations that HoloViews provides and the great looking plots incorporated in the seaborn library. Along the way we’ll explore how to wrap different types of data in a number of Seaborn View types.

Building and Running a Recommendation Engine at Any Scale
This post shows you how to build a powerful, scalable, customizable recommendation engine using Mortar Data and run it on AWS. You’ll fork an open-source template project, so you won’t have to build from scratch, and you’ll start seeing results fast.