Decision Scientist vs. Data Scientist
I would suggest that Decision Sciences ends where Data Science begins. Decision Scientists don’t necessarily work with Big Data. Data Scientists are specialized to work with Big Data (or recognize when something isn’t truly Big Data) and all the problems associated with that domain.
Guide to big data analytics tools, trends and best practices
• Business benefits
• New developments
• News Stories
• Best Practices
• Videos
• Glossary
Brief Guide to Big Data and Predictive Analytics for non-experts
Brief guide to Big Data and Predictive Analytics for non-experts suggests key books, films, and websites to learn more.
Introduction to Linear Models and Matrix Algebra MOOC starts this Monday Feb 16
Matrix algebra is the language of modern data analysis. We use it to develop and describe statistical and machine learning methods, and to code efficiently in languages such as R, matlab and python. Concepts such as principal component analysis (PCA) are best described with matrix algebra. It is particularly useful to describe linear models. Linear models are everywhere in data analysis. ANOVA, linear regression, limma, edgeR, DEseq, most smoothing techniques, and batch correction methods such as SVA and Combat are based on linear models. In this two week MOOC we well describe the basics of matrix algebra, demonstrate how linear models are used in the life sciences and show how to implement these efficiently in R.
Five Signals Your Organization Is Ready For Big Data
1) A Shared Understanding of Big Data Exists Within Your Organization
2) Business Intelligence Is Part of Decision-Making Processes
3) Experimenting and Innovation Is Encouraged
4) You Work With High Quality Data
5) The Board Understands the Value of Big Data
What Do I Do With All This Data?
1. Getting Started – Establish an ROI
2. Perform a Business Needs Analysis
3. High Quality Data for Smarter Decisions
Eight New Ideas From Data Visualization Experts
1. Make Interactive Graphs
2. Make Graphs With IPython Widges In Plotly & Domino
3. Reproducible Research with Plotly & Overleaf
4. Use Statistical Graphs
5. Use 3D Graphs
6. Embed Interactive Graphs With Dashboards
7. Customiz Interactive Graphs With JavaScript
8. Embed Interactive Graphs With Shiny
Cache Rules Everything Around Me
In this post, we’re going to be discussing:
• Rcpp
• R’s C interface
• The importance of CPU caches
• Performance benchmarking
Principal Component Analysis
Principal component analysis (PCA) is a technique used to emphasize variation and bring out strong patterns in a dataset. It’s often used to make data easy to explore and visualize.
Illustration of principal component analysis (PCA)
A principal component analysis is a way to reduce dimensionality of a data set consisting of numeric vectors to a lower dimensionality. Then it is possible to visualize the data set in three or less dimensions. Have a look at this use case. I’ll try to explain the motivation using a simple example.
May Bayes Theorem Be with You
The frequentist paradigm enjoys the most widespread acceptance for statistical analysis. Frequentist concepts such as confidence intervals and p-values dominate introductory statistics curriculums from science departments to business schools, and frequentist methods are still the go-to tools for most practitioners. However, all practitioners in data science and statistics would benefit from integrating Bayesian techniques into their arsenal. This post discusses two reasons why:
1. Bayesian statistics offers a framework to handle uncertainty that is based on a more intuitive mental model than the frequentist paradigm.
2. Bayesian regression has close ties to regularization techniques while also giving us a principled approach to explicitly expressing prior beliefs. This helps us combat multicollinearity and overfitting.
How to Machine Learn
In 3 Steps
An Introduction to Supervised Machine Learning and Pattern Classification: The Big Picture
Practical Data Science. Slides of an 1 hour introductory talk about predictive modeling using Machine Learning with a focus on supervised learning
An Introduction on How to Make Beautiful Charts With R and ggplot2
These charts were made using ggplot2, an add-on package for the R programming language, along with lots of iterative improvement over the months. R notably has chart-making capabilities built into the language by default, but it is not easy to use and often produces very simplistic charts. Enter ggplot2, which allows users to create full-featured and robust charts with only a few lines of code.
Forecasting events, from disease outbreaks to sales to cancer research