5  A  B  C  D  E  F  G  H  I  K  L  M  N  O  P  R  S  T  U  W  
Courses = 145  
CourseBuffet  At CourseBuffet we personally examine and classify every course we list. This means you find courses faster and can compare them more easily. Spend your time learning not searching. https://…/search?q=Machine+Learning https://…/search?q=Data+Science https://…/search?q=Data+Mining https://…/search?q=Big+Data https://…/search?q=Statistics 
5 

5 Minutes With Ingo (RapidMiner) 
Spend 5 minutes with Dr. Ingo Mierswa, CEO, cofounder and data scientist in residence, at RapidMiner. In this series, Ingo shares his thoughts about trends, challenges and opportunities in analytics and also breaks down complex data science concepts into understandable segments. Plus, Ingo has some fun with his friend Data Scientist Number 7, Whiteboard Number 2, Marla Mierswa – RapidMiner’s Chief Furry Officer, and a series of other special guests and RapidMiners. 
A 

A Handson Introduction to Statistics with R (Datacamp) 
This selection of courses is designed to be a comprehensive yet friendly introduction to fundamental concepts in statistics. The focus is on statistics but you will make use of the statistical programming language R. For those new to R, an introduction to the R programming language is provided. This course is, quite literally, for everyone. Whether you’re new to statistics, need a refresher course, or a relatively advanced researcher or analyst. 
A/B Testing (Udacity) 
This course will cover the design and analysis of A/B tests, also known as split tests, which are online experiments used to test potential improvements to a website or mobile application. Two versions of the website are shown to different users – usually the existing website and a potential change. Then, the results are analyzed to determine whether the change is an improvement worth launching. This course will cover how to choose and characterize metrics to evaluate your experiments, how to design an experiment with enough statistical power, how to analyze the results and draw valid conclusions, and how to ensure that the the participants of your experiments are adequately protected. 
Advanced Statistics (Saylor) 
Welcome to the amazing world of statistics! You might be thinking that the topic is just about a bunch of charts, graphs, and oddlooking formulas, but in fact, it is a fascinating and challenging field of study. In this course, we will indeed study those charts and graphs, and yes, that array of complex formulas. But beyond those tools, we will find an entire new way of thinking, a new way of approaching and understanding the world around us. We will learn why taking aspirin helps lower the risk and severity of a heart attack; how researchers have determined that the more friends you have on a social networking site, the more likely you are to have fewer friends in real life; and how political pollsters almost always know the outcome of an election even before the polls open. The course is divided into 10 units of study. The first two units are devoted to simple statistical calculations and graphical representations of data. Most of this material will be familiar to you from previous math or science courses. Unit 3 is devoted to a foundational concept of statistics, which is the study of probability. Unit 4 will introduce you to random variables and a very important distribution called the binomial distribution. Unit 5 will focus entirely on one topic: the bell curve. You may have studied the bell curve, also called the normal distribution, in other courses, but this unit will make sure that you are confident and competent in knowing its properties, its uses, and its central importance to all of the material in the rest of the course and in the entire field of statistics. The first five units build the foundation of concepts, vocabulary, knowledge, and skills for success in the remainder of the course. In the final five units, we will take the plunge into the domain of inferential statistics, where we make statistical decisions based on the data that we have collected. In Unit 6, we will learn how to design statistically sound experiments and studies, in order to collect valid, reliable data. In Unit 7 and Unit 8, we will learn how to analyze the data, using confidence intervals and hypothesis tests, to make statistically sound decisions and inferences about our results. The final two units will be devoted to two topics frequently used in statistical research: linear regression and chisquare analysis. These exoticsounding topics will act as springboards for your further study of the discipline in your college undergraduate or graduate programs. We will use a variety of resources in addition to the text. An online course needs to be multidimensional so that you won’t be lulled into a daily grind of textbook reading and doing homework problems. In light of this, you will supplement your course text with video lessons, interactive applets, and research into some of statistics’ most interesting, controversial, and fascinating experiments and studies. By the end of the course, you will have mastered the foundational concepts of a field of endeavor that will assist you in studying and understanding the world around you as never before. 
Algorithms (Saylor) 
This course focuses on the fundamentals of computer algorithms, emphasizing methods useful in practice. We look into the algorithm analysis as a way to understand behavior of computer programs as a function of its input size. Using the bigO notation, we classify algorithms by their efficiency. We look into basic algorithm strategies and approaches to problem solving. Some of these approaches include the divide and conquer method, dynamic programming, and greedy programming paradigms. Sorting and searching algorithms are discussed in detail as they form part of a solution to a large number of problems solved using computers. We also provide an introduction to the graph theory and graph algorithms as they are also used in many computerbased applications today. We conclude the course with a look into a special class of problems called the NPcomplete problems. 
An Introduction to Interactive Programming in Python (Coursera) 
This course is designed to be a fun introduction to the basics of programming in Python. Our main focus will be on building simple interactive games such as Pong, Blackjack and Asteroids. 
An Introduction to Interactive Programming in Python (Part 1) (Coursera) 
This twopart course is designed to help students with very little or no computing background learn the basics of building simple interactive applications. Our language of choice, Python, is an easyto learn, highlevel computer language that is used in many of the computational courses offered on Coursera. To make learning Python easy, we have developed a new browserbased programming environment that makes developing interactive applications in Python simple. These applications will involve windows whose contents are graphical and respond to buttons, the keyboard and the mouse. The primary method for learning the course material will be to work through multiple ‘miniprojects’ in Python. To make this class enjoyable, these projects will include building fun games such as Pong, Blackjack, and Asteroids. When you’ve finished our course, we can’t promise that you will be a professional programmer, but we think that you will learn a lot about programming in Python and have fun while you’re doing it. 
An Introduction to Interactive Programming in Python (Part 2) (Coursera) 
This twopart course is designed to help students with very little or no computing background learn the basics of building simple interactive applications. Our language of choice, Python, is an easyto learn, highlevel computer language that is used in many of the computational courses offered on Coursera. To make learning Python easy, we have developed a new browserbased programming environment that makes developing interactive applications in Python simple. These applications will involve windows whose contents are graphical and respond to buttons, the keyboard and the mouse. The primary method for learning the course material will be to work through multiple ‘miniprojects’ in Python. To make this class enjoyable, these projects will include building fun games such as Pong, Blackjack, and Asteroids. When you’ve finished our course, we can’t promise that you will be a professional programmer, but we think that you will learn a lot about programming in Python and have fun while you’re doing it. 
Applied Logistic Regression (Coursera) 
This Applied Logistic Regression course provides theoretical and practical training for epidemiologists, biostatisticians and professionals of related disciplines in statistical modeling with particular emphasis on logistic regression. The increasingly popular logistic regression model has become the standard method for regression analysis of binary response data in the health sciences. By the end of this course, students should • Master methods of statistical modeling when the response variable is binary. • Be confident users of the Stata package for computing binary logistic regression models. This is a handson, applied course where students will become proficient at using computer software to analyze data drawn primarily from the fields of medicine, epidemiology and public health. There will be many practical examples and homework exercises in this class to help you learn. If you fully apply yourself in this course and complete all of the homework, you will have the opportunity to master various methods of statistical modeling and you will become a more confident user of the Stata* package for computing linear, polynomial and multiple regression. 
Artificial Intelligence (Saylor) 
CS405 introduces the field of artificial intelligence (AI). Materials on AI programming, logic, search, game playing, machine learning, natural language understanding, and robotics introduce the student to AI methods, tools, and techniques, their application to computational problems, and their contribution to understanding intelligence. Because each of these topics could be a course unto itself, the material is introductory and not complete. Each unit presents the problem a topic addresses, current progress, and approaches to the problem. The readings include and cite more materials that are referenced in this course, and students are encouraged to use these resources to pursue topics of interest after this course. 
Artificial Intelligence Planning (Coursera) 
The course aims to provide a foundation in artificial intelligence techniques for planning, with an overview of the wide spectrum of different problems and approaches, including their underlying theory and their applications. 
B 

Basics of Machine Learning (Datacamp) 
Naive Bayes, decision trees, zerofrequency, missing data, ID3 algorithm, information gain, overfitting, confidence intervals, nearestneighbour method, Parzen windows, KD trees, Kmeans, scree plot, gaussian mixtures, EM algorithm, dimensionality reduction, principal components, eigenfaces, agglomerative clustering, singlelink vs. complete link, lancewilliams algorithm 
Bayesian Modelling in Python  Welcome to ‘Bayesian Modelling in Python’ – a tutorial for those interested in learning how to apply bayesian modelling techniques in python (PYMC3). This tutorial doesn’t aim to be a bayesian statistics tutorial – but rather a programming cookbook for those who understand the fundamental of bayesian statistics and want to learn how to build bayesian models using python. The tutorial sections and topics can be seen below. 
Big Data Analysis with Revolution R Enterprise (Datacamp) 
Revolution R Enterprise allows R users to process, visualize, and model terabyteclass data sets at a fraction of the time of legacy products without requiring expensive or specialized hardware. Introductory course for accomplished R users to experience the functionality of Revolution R Enterprise. 
Big Data in Education (Coursera) 
The emerging research communities in educational data mining and learning analytics are developing methods for mining and modeling the increasing amounts of finegrained data becoming available about learners. In this class, you will learn about these methods, and their strengths and weaknesses for different applications. You will learn how to use each method to answer education research questions and to drive intervention and improvement in educational software and systems. Methods will be covered both at a theoretical level, and in terms of how to apply and execute them using standard software tools. Issues of validity and generalizability will also be covered, towards learning to establish how trustworthy and applicable the results of an analysis are. 
Build Intelligent Applications (Coursera) 
Master machine learning fundamentals in five handson courses. Extract insights from data, build selfimproving applications, and apply algorithms to realworld problems. This Specialization provides a casebased introduction to the exciting, highdemand field of machine learning. You’ll learn to analyze large and complex datasets, build applications that can make predictions from data, and create systems that adapt and improve over time. In the final Capstone Project, you’ll apply your skills to solve an original, realworld problem through implementation of machine learning algorithms. • Machine Learning Foundations: A Case Study Approach • Regression • Classification • Clustering & Retrieval • Recommender Systems & Dimensionality Reduction 
C 

Carnegie Mellon University  Machine Learning Videos 
Cluster Analysis in Data Mining (Coursera) 
Discover the basic concepts of cluster analysis, and then study a set of typical clustering methodologies, algorithms, and applications. This includes partitioning methods such as kmeans, hierarchical methods such as BIRCH, densitybased methods such as DBSCAN/OPTICS, probabilistic models, and the EM algorithm. Learn clustering and methods for clustering high dimensional data, streaming data, graph data, and networked data. Explore concepts and methods for constraintbased clustering and semisupervised clustering. Finally, see examples of cluster analysis in applications. 
Computational Methods for Data Analysis (Coursera) 
Exploratory and objective data analysis methods applied to the physical, engineering, and biological sciences. 
Computational Statistics in Python  A Tutorial 
Computing for Data Analysis (Coursera) 
This course is about learning the fundamental computing skills necessary for effective data analysis. You will learn to program in R and to use R for reading data, writing functions, making informative graphs, and applying modern statistical methods. 
Core Concepts in Data Analysis (Coursera) 
Learn both theory and application for basic methods that have been invented either for developing new concepts – principal components or clusters, or for finding interesting correlations – regression and classification. This is preceded by a thorough analysis of 1D and 2D data. 
D 

Data Analysis (Coursera) 
Learn about the most effective data analysis methods to solve problems and achieve insight. 
Data Analysis and Statistical Inference (Coursera) 
This course introduces you to the discipline of statistics as a science of understanding and analyzing data. You will learn how to effectively make use of data in the face of uncertainty: how to collect data, how to analyze data, and how to use data to make inferences and conclusions about real world phenomena. 
Data Analysis and Statistical Inference (Datacamp) 
In this course you are introduced to the discipline of statistics as a science of understanding and analyzing data. You will learn how to effectively make use of data in the face of uncertainty: how to collect data, how to analyze data, and how to use data to make inferences and conclusions about real world phenomena. 
Data Analysis and Statistical Inference (Datacamp) 
This interactive DataCamp course complements the Coursera course Data Analysis and Statistical Inference by Mine ÇetinkayaRundel. For every lesson given at Coursera, you can follow interactive exercises in the comfort of your browser to master the different topics. 
Data Analysis in R, the data.table Way (Datacamp) 
The data.table package is rapidly making its name as the number one choice for handeling large datasets in R. This course will bring you from data.table novice to data.table expert. 
Data Analysis Learning Path  The Data Analysis learning path provides a short but intensive introduction to the field of data analysis. The path is divided into three parts. In part 1, we learn general programming practices (software design, version control) and tools (python, sql, unix, and Git). In part 2, we learn R and focus more narrowly on data analysis, studying statistical techniques, machine learning, and presentation of findings. Part 3 includes a choice of elective topics: visualization, social network analysis, and big data (Hadoop and MapReduce). Choose from any or all of them to enrich your understanding and skills. The course consists of free online lectures, homework assignments, quizzes and projects, and will take around 350400 hours. There will also be a capstone project at the end that you can use to demonstrate your skills to potential employers or for a school application. This is an intensive path with a lot of material to learn, but at the end, you will know all the tools and techniques you need to start analyzing data: how to manipulate data, apply statistical and machine learning techniques, and analyze and visualize results. You should also be prepared to begin a career in data analysis. 
Data Analysis with R (Udacity) 
Exploratory Data Analysis (EDA) is an approach to data analysis for summarizing and visualizing the important characteristics of a data set. Promoted by John Tukey, exploratory data analysis focuses on exploring data to understand the data’s underlying structure and variables, to develop intuition about the data set, consider how that data set came into existence, and decide how it can be investigated with more formal statistical methods. 
Data Analyst (Udacity) 
This nanodegree program is the most efficient curriculum to prepare you for a job as a Data Analyst. You will learn to: • Wrangle, extract, transform, and load data from various databases, formats, and data sources • Use exploratory data analysis techniques to identify meaningful relationships, patterns, or trends from complex data sets • Classify unlabeled data or predict into the future with applied statistics and machine learning algorithms • Communicate data analysis and findings well through effective data visualizations You will work with your peers and advisors on projects approved by leading employers as the critical indicators of jobreadiness. We designed these projects with expert Data Analysts, Data Scientists, and hiring managers. 
Data Lakes for Big Data (EMC) 
Each day an astounding amount of data is generated from just about everything around us – from our mobile devices to our health care provider to where we shop for groceries – just to name a few. Big Data is a term used to describe the volume of data, variety or type – both structured and unstructured, and speed (real time or near real time). Businesses have become increasingly focused on analyzing big data to increase revenue, drive down costs, and reduce risk to the business. In addition organizations have started to face infrastructure and platform challenges to store, manage and analyze these vast amounts of varied of data for quick turnaround. This free, open, online, interactive course will expose you to the value, opportunity, and insights that Big Data can provide. It highlights the Federation Business Data Lake as an integrated solution that stores and provides access to Big Data for realtime, rapid analytics and predictive modeling. If you are a decision maker or influencer keen to learn and define your organization’s big data strategy, if you work directly or indirectly with data, if you are a student or just purely interested in Big Data and Data Lakes, this introductory course is for you! There are no specific prerequisites expected to attend this course. 
Data Manipulation in R with dplyr (Datacamp) 
In this interactive course, you will learn how to to perform sophisticated data manipulation tasks using dplyr. Master the five verbs of data manipulation, and complementing techniques to chain your operations, perform groupwise calculations and access data stored in a database outside R. 
Data Mining Course (KDNuggets) 
Here are the teaching modules for a onesemester introductory course on Data Mining, suitable for advanced undergraduates or firstyear graduate students. The teaching modules were created by Gregory PiatetskyShapiro and Gary Parker. 
Data Science (Harvard) 
Learning from data in order to gain useful predictions and insights. This course introduces methods for five key facets of an investigation: data wrangling, cleaning, and sampling to get a suitable data set; data management to be able to access big data quickly and reliably; exploratory data analysis to generate hypotheses and intuition; prediction based on statistical methods such as regression and classification; and communication of results through visualization, stories, and interpretable summaries. 
Data Science (UCIrvine) 
Corporations have dramatically increased investments in their “digital enterprise” in the past few years. It has been estimated that by 2020, IT departments will be monitoring 50 times more data than they are today. This tidal wave of data is driving unprecedented demand for those with the skills required to manage and leverage these very large data sets into a competitive advantage. Curriculum is designed to help meet the expanding needs for data scientists who are skilled in the utilization of a unique blend of science, art and business. These professionals understand how to automate methods of collecting and analyzing data and utilize techniques to discover previously hidden insights that can profoundly impact the success of any business. Understand the skills needed to effectively collect and manage big data, perform datadriven discovery and prediction, and extract value and competitive intelligence for your organization. This program provides the skills required to become a data scientist and provides existing data analysts with opportunities to broaden skills. Learn topics such as: utilizing concepts in on and offcloud; scalable data engineering (inspecting, cleaning, transforming, and modeling data), unstructured data and NoSQL; computational statistics; pattern recognition; data mining /predictive analytics; machine learning; data visualization; and high performance software and hardware. 
Data Science and Machine Learning Essentials (edx) 
Demand for Data science talent is exploding. Learn these essentials with experts from M.I.T and the industry, partnering with Microsoft to help develop your career as a data scientist. By the end of this course, you will know how to build and derive insights from data science and machine learning models. You will learn key concepts in data acquisition, preparation, exploration and visualization along with examples on how to build a cloud data science solution using Azure Machine Learning, R & Python. Data Science is an essential skill for analyzing and deriving useful insights from data, big and small. McKinsey estimates that by 2018, a 500,000 strong workforce of data scientists will be needed in US alone. The resulting talent gap must be filled by a new generation of data scientists. This course is organized into 5 weekly modules each concluding with a quiz. By achieving a passing grade in the final course assessment you will receive a certificate demonstrating that you have acquired data science skills and knowledge. Apart from answering your questions on the forum, faculty will host an office hour to address questions you may have while undertaking this course. 
Data Science Capstone (Coursera) 
The capstone project class will allow students to create a usable/public data product that can be used to show your skills to potential employers. Projects will be drawn from realworld problems and will be conducted with industry, government, and academic partners. 
Data Science Lectures (Harvard Extension School) 
Learning from data in order to gain useful predictions and insights. This course introduces methods for five key facets of an investigation: data wrangling, cleaning, and sampling to get a suitable data set; data management to be able to access big data quickly and reliably; exploratory data analysis to generate hypotheses and intuition; prediction based on statistical methods such as regression and classification; and communication of results through visualization, stories, and interpretable summaries. 
Data to Insight: An Introduction to Data Analysis (FutureLearn) 
This course is a handson introduction to statistical data analysis that emphasises fundamental concepts and practical skills. 
Data Visualization (Coursera) 
Learn to present data to an observer in a way that yields insight and understanding. The first week focuses on the infrastructure for data visualization. It introduces elementary graphics programming, focusing primarily on twodimensional vector graphics and the programming platforms for graphics. This infrastructure will also include lessons on the human side of visualization, studying human perception and cognition to gain a better understanding of the target of the data visualization. The second week will utilize the knowledge of graphics programming and human perception in the design and construction of visualizations, starting with simple charts and graphs and incorporating animation and user interactivity. The third week expands the data visualization vocabulary with more sophisticated methods, including hierarchical layouts and networks. The final week focuses on visualization of database and data mining processes, with methods specifically focused on visualization of unstructured information, such as text, and systems for visual analytics that provide decision support. 
Data Visualization in R with ggvis (Datacamp) 
Learn to create static and interactive graphs to display distributions, relationships, model fits, and more. Get familiar with ggvis and its grammar of graphics. 
Decision Making in a Complex and Uncertain World (FutureLearn) 
This course will teach you the first principles of complexity, uncertainty and how to make decisions in a complex world. 
Deep Learning Tutorial (Stanford) 
This tutorial will teach you the main ideas of Unsupervised Feature Learning and Deep Learning. By working through it, you will also get to implement several feature learning/deep learning algorithms, get to see them work for yourself, and learn how to apply/adapt these ideas to new problems. This tutorial assumes a basic knowledge of machine learning (specifically, familiarity with the ideas of supervised learning, logistic regression, gradient descent). If you are not familiar with these ideas, we suggest you go to this Machine Learning course and complete sections II, III, IV (up to Logistic Regression) first. 
Developing Data Products (Coursera) 
Learn the basics of creating data products using Shiny, R packages, and interactive graphics. This is the ninth course in the Johns Hopkins Data Science Specialization. 
Digital Analytics Fundamentals (Google) 
This course provides a foundation for marketers and analysts seeking to understand the core principles of digital analytics and to improve business performance through better digital measurement. 
E 

EspressoWebinars (SAP) 
– 
Exploratory Data Analysis (Coursera) 
Learn the essential exploratory techniques for summarizing data. This is the fourth course in the Johns Hopkins Data Science Specialization. 
F 

Factor Analysis and SEM  This video provides an introduction to factor analysis and SEM starting from the basic principles and building up from there. The course covers model description, notation, estimation methods in factor analysis, measures of model fit, hypothesis testing, identification of models. 
Foundations of Strategic Business Analytics (Coursera) 
With this course, you’ll have a first overview on Strategic Business Analytics topics. We’ll discuss a wide variety of applications of Business Analytics. From Marketing to Supply Chain or Credit Scoring and HR Analytics, etc. We’ll cover many different data analytics techniques, each time explaining how to be relevant for your business. We’ll pay special attention to how you can produce convincing, actionable, and efficient insights. We’ll also present you with different data analytics tools to be applied to different types of issues. By doing so, we’ll help you develop four sets of skills needed to leverage value from data: Analytics, IT, Business and Communication. 
G 

Getting and Cleaning Data (Coursera) 
Learn how to gather and clean data from a variety of sources. This is the third course in the Johns Hopkins Data Science Specialization. 
H 

HandsOn Data Science with R  This web site provides extensive material for the Data Scientist. Togaware also provides a unique offering of insitu handson training. We offer traditional outofoffice training courses, but we find more effective learning can occur handson insitu. We offer one of the world’s leading Data Scientists to work alongside and mentor your staff over one or two weeks. We work confidentially on actual projects, with training “onthejob” provided by a professional with 30 years experience in the industry and author of the best selling book on Data Mining with Rattle and R. 
How to work with Quandl in R (Datacamp) 
Quandl is the easiest way to find data on the internet. It offers millions of free and open financial, economic, and social datasets, aggregated from hundreds of top sources, in a user friendly format, including options for embeddable charts and data transformations. In this interactive tutorial you will learn how to effortlessly pull any of Quandl’s data into R for quick and easy analysis! 
I 

Importing Data Into R (Datacamp) 
Importing your data into R to start your analyses: it should be easiest step. Unfortunately, this is almost never the case. Data can come in all sorts of formats, going from flat files, and statistical software files to databases and web data. Knowing which approach to use is key to get started with the actual analysis. In this course, you will learn all the basics on how to get up and running in no time! 
Intermediate R (Datacamp) 
The intermediate R course is the logical next stop on your journey in the R programming language. Learn about conditional statements, loops and functions to power your own R scripts. Next, you can make your R code more efficient and readable using the apply functions. Finally, the utilities chapter gets you up to speed with regular expressions in R, data structure manipulations and times and dates. 
Intro to Computer Science (Udacity) 
In this introductory course, you’ll learn and practice key computer science concepts by building your own versions of popular web applications. You’ll learn Python, a powerful, easytolearn, and widely used programming language, and you’ll explore fundamental computer science concepts, as you build your own search engine and social network. 
Intro to Data Science (Udacity) 
The Introduction to Data Science class will survey the foundational topics in data science, namely: • Data Manipulation • Data Analysis with Statistics and Machine Learning • Data Communication with Information Visualization • Data at Scale – Working with Big Data The class will focus on breadth and present the topics briefly instead of focusing on a single topic in depth. This will give you the opportunity to sample and apply the basic techniques of data science. 
Intro to Descriptive Statistics (Udacity) 
Statistics is an important field of math that is used to analyze, interpret, and predict outcomes from data. Descriptive statistics will teach you the basic concepts used to describe data. This is a great beginner course for those interested in Data Science, Economics, Psychology, Machine Learning, Sports analytics and just about any other field. 
Intro to Hadoop and MapReduce (Udacity) 
The Apache Hadoop project develops opensource software for reliable, scalable, distributed computing. Learn the fundamental principles behind it, and how you can use its power to make sense of your Big Data. 
Intro to Inferential Statistics (Udacity) 
Inferential statistics allows us to draw conclusions from data that might not be immediately obvious. This course focuses on enhancing your ability to develop hypotheses and use common tests such as ttests, ANOVA tests, and regression to validate your claims. 
Intro to Python for Data Science (Datacamp) 
Python is a generalpurpose programming language that is becoming more and more popular to do data science. Companies worldwide are using Python to harvest insights from their data and get a competitve edge. Unlike other Python tutorials, this course focuses on Python specifically for data science. You will learn about powerful ways to store and manipulate data as well as cool data science tools to start your own analyses. 
Intro to Statistics (Udacity) 
Statistics is about extracting meaning from data. In this class, we will introduce techniques for visualizing relationships in data and systematic techniques for understanding the relationships using mathematics. 
Introduction to Big Data with Apache Spark (edx) 
Organizations use their data for decision support and to build dataintensive products and services, such as recommendation, prediction, and diagnostic systems. The collection of skills required by organizations to support these functions has been grouped under the term Data Science. This course will attempt to articulate the expected output of Data Scientists and then teach students how to use PySpark (part of Apache Spark) to deliver against these expectations. The course assignments include Log Mining, Textual Entity Recognition, Collaborative Filtering exercises that teach students how to manipulate data sets using parallel processing with PySpark. This course covers advanced undergraduatelevel material. It requires a programming background and experience with Python (or the ability to learn it quickly). All exercises will use PySpark (part of Apache Spark), but previous experience with Spark or distributed computing is NOT required. Students should take this Python miniquiz before the course and take this Python minicourse if they need to learn Python or refresh their Python knowledge. 
Introduction to Computational Finance and Financial Econometrics (Coursera) 
Learn mathematical and statistical tools and techniques used in quantitative and computational finance. Use the open source R statistical programming language to analyze financial data, estimate statistical models, and construct optimized portfolios. Analyze real world data and solve real world problems. 
Introduction to Computational Finance and Financial Econometrics (Datacamp) 
Get an indepth insight into the mathematical and statistical tools and techniques used in quantitative and computational finance! 
Introduction to Data Science (Coursera) 
Join the data revolution. Companies are searching for data scientists. This specialized field demands multiple skills not easy to obtain through conventional curricula. Introduce yourself to the basics of data science and leave armed with practical experience extracting value from big data. 
Introduction to Data Science with R  How to Manipulate, Visualize, and Model Data with the R Language. Learn practical skills for visualizing, transforming, and modeling data in R. This comprehensive video course shows you how to explore and understand data, as well as how to build linear and nonlinear models in the R language and environment. It’s ideal whether you’re a nonprogrammer with no data science experience, or a data… 
Introduction to Linear Models and Matrix Algebra (edx) 
We will teach a review of linear algebra, including matrix notation and the concept of projections, which underlies many of the current tools for analyzing largedimensional data. We will then use linear models to represent differences between experimental units and perform statistical inference on these differences. Topics: • Linear algebra: matrix notation, projections • Linear models 
Introduction to Machine Learning (Datacamp) 
This course is perfect for those who have a solid basis in R and statistics, but are completely new to machine learning. After a broad overview of the discipline’s most common techniques and applications, you’ll gain more insight into the assessment and training of different machine learning models. The rest of the course is dedicated to a first reconnaissance with three of the most basic machine learning tasks: classification, regression and clustering. 
Introduction to Natural Language Processing (Coursera) 
This course provides an introduction to the field of Natural Language Processing. It includes relevant background material in Linguistics, Mathematics, Statistics, and Computer Science. Some of the topics covered in the class are Text Similarity, Part of Speech Tagging, Parsing, Semantics, Question Answering, Sentiment Analysis, and Text Summarization. The course includes quizzes, programming assignments, and a final exam. 
Introduction to Probability Theory (Saylor) 
This course will introduce you to the fundamentals of probability theory and random processes. The theory of probability was originally developed in the 17th century by two great French mathematicians, Blaise Pascal and Pierre de Fermat, to understand gambling. Today, the theory of probability has found many applications in science and engineering. Engineers use data from manufacturing processes to sample characteristics of product quality in order to improve the products being produced. Pharmaceutical companies perform experiments to determine the effect of a drug on humans and use the results to make decisions about treatment of illnesses, while economists observe the state of the economy over periods of time and use the information to forecast the economic future. In this course, you will learn the basic terminology and concepts of probability theory, including random experiments, sample spaces, discrete distribution, probability density function, expected values, and conditional probability. You will also learn about the fundamental properties of several special distributions, including binomial, geometric, normal, exponential, and Poisson distributions, as well as how to use them to model reallife situations and solve applied problems. 
Introduction to R (Datacamp) 
With over 2 million users worldwide R is rapidly becoming the leading programming language in statistics and data science. Every year, the number of R users grows by 40%, and an increasing number of organizations are using it in their daytoday activities. In this introduction to R, you will master the basics of this beautiful open source language such as factors, lists and data frames. With the knowledge gained in this course, you will be ready to undertake your first very own data analysis. 
Introduction to R (Datacamp) 
In this introduction to R, you will master the basics of this beautiful open source language. We’ll take you on trips to Las Vegas and galaxies far far away. Basic topics such as factors, lists and data frames will be covered. After finishing this introductory R course, you’ll master some very valuable R skills and are ready to undertake your first very own data analysis. 
Introduction to Recommender Systems (Coursera) 
Recommender systems have changed the way people find products, information, and even other people. They study patterns of behavior to know what someone will prefer from among a collection of things he has never experienced. The technology behind recommender systems has evolved over the past 20 years into a rich collection of tools that enable the practitioner or researcher to develop effective recommenders. We will study the most important of those tools, including how they work, how to use them, how to evaluate them, and their strengths and weaknesses in practice. The algorithms we will study include contentbased filtering, useruser collaborative filtering, itemitem collaborative filtering, dimensionality reduction, and interactive critiquebased recommenders. The approach will be handson, with six two week projects, each of which will involve implementation and evaluation of some type of recommender. 
Introduction to Statistics (Saylor) 
In this course, you will look at the properties behind the basic concepts of probability and statistics and focus on applications of statistical knowledge. You will learn about how statistics and probability work together. The subject of statistics involves the study of methods for collecting, summarizing, and interpreting data. Statistics formalizes the process of making decisions, and this course is designed to help you use statistical literacy to make better decisions. Note that this course has applications for the natural sciences, economics, computer science, finance, psychology, sociology, criminology, and many other fields. We read data in articles and reports every day. After finishing this course, you should be comfortable evaluating an author’s use of data. You will be able to extract information from articles and display that information effectively. You will also be able to understand the basics of how to draw statistical conclusions. This course will begin with descriptive statistics and the foundation of statistics. You will then learn about probability and random distributions, the latter of which enables us to work with several aspects of random events and their applications. Finally, you will examine a number of ways to investigate the relationships between various characteristics of data. By the end of this course, you should have a grasp on what statistics represent, how to use them to organize and display data, and how to test data to make effective conclusions. 
Introduction to Statistics (Saylor) 
In this course, you will look at the properties behind the basic concepts of probability and statistics and focus on applications of statistical knowledge. You will learn about how statistics and probability work together. The subject of statistics involves the study of methods for collecting, summarizing, and interpreting data. Statistics formalizes the process of making decisions, and this course is designed to help you use statistical literacy to make better decisions. Note that this course has applications for the natural sciences, economics, computer science, finance, psychology, sociology, criminology, and many other fields. We read data in articles and reports every day. After finishing this course, you should be comfortable evaluating an author’s use of data. You will be able to extract information from articles and display that information effectively. You will also be able to understand the basics of how to draw statistical conclusions. This course will begin with descriptive statistics and the foundation of statistics. You will then learn about probability and random distributions, the latter of which enables us to work with several aspects of random events and their applications. Finally, you will examine a number of ways to investigate the relationships between various characteristics of data. By the end of this course, you should have a grasp on what statistics represent, how to use them to organize and display data, and how to test data to make effective conclusions. 
Introduction to Statistics (Saylor) 
If you invest in financial markets, you may want to predict the price of a stock in six months from now on the basis of company performance measures and other economic factors. As a college student, you may be interested in knowing the dependence of the mean starting salary of a college graduate, based on your GPA. These are just some examples that highlight how statistics are used in our modern society. To figure out the desired information for each example, you need data to analyze. The purpose of this course is to introduce you to the subject of statistics as a science of data. There is data abound in this information age; how to extract useful knowledge and gain a sound understanding in complex data sets has been more of a challenge. In this course, we will focus on the fundamentals of statistics, which may be broadly described as the techniques to collect, clarify, summarize, organize, analyze, and interpret numerical information. This course will begin with a brief overview of the discipline of statistics and will then quickly focus on descriptive statistics, introducing graphical methods of describing data. You will learn about combinatorial probability and random distributions, the latter of which serves as the foundation for statistical inference. On the side of inference, we will focus on both estimation and hypothesis testing issues. We will also examine the techniques to study the relationship between two or more variables; this is known as regression. By the end of this course, you should gain a sound understanding about what statistics represent, how to use statistics to organize and display data, and how to draw valid inferences based on data by using appropriate statistical tools. 
K 

Kaggle R Tutorial on Machine Learing (Datacamp) 
Always wanted to compete in a Kaggle competition but not sure you have the right skillset? This interactive tutorial by Kaggle and DataCamp on Machine Learning offers the solution. Stepbystep you will learn through fun coding exercises how to predict survival rate for Kaggle’s Titanic competition using Machine Learning techniques. Upload your results and see your ranking go up! 
L 

Learn Data Science Fundamentals (Coursera) 
Learn SAS or Python programming, expand your knowledge of analytical methods and applications, and conduct original research to inform complex decisions. The Data Analysis and Interpretation Specialization takes you from data novice to data analyst in just four projectbased courses. You’ll learn to apply basic data science tools and techniques, including data visualization, regression modeling, and machine learning. Throughout the Specialization, you will analyze research questions of your choice and summarize your insights. In the final Capstone Project, you will use real data to address an important issue in society, and report your findings in a professionalquality report. These instructors are here to create a warm and welcoming place at the table for everyone. Everyone can do this, and we are building a community to show the way. • Data Management and Visualization • Data Analysis Tools • Regression Modeling in Practice • Machine Learning for Data Analysis 
Learn numerics, science, and data with Python (Scipy Lecture Notes) 
Tutorials on the scientific Python ecosystem: a quick introduction to central tools and techniques. The different chapters each correspond to a 1 to 2 hours course with increasing level of expertise, from beginner to expert. 
Learn SQL with Codecademy and Periscope (Codecademy) 
SQL, ‘Structured Query Language’, is a programming language designed to manage data stored in relational databases. SQL operates through simple, declarative statements. This keeps data accurate and secure, and helps maintain the integrity of databases, regardless of size. The SQL language is widely used today across web frameworks and database applications. Knowing SQL gives you the freedom to explore your data, and the power to make better decisions. By learning SQL, you will also learn concepts that apply to nearly every data storage system. 
Linear Algebra (Saylor) 
This course is an introduction to linear algebra. It has been argued that linear algebra constitutes half of all mathematics. Whether or not everyone would agree with that, it is certainly true that practically every modern technology relies on linear algebra to simplify the computations required for Internet searches, 3D animation, coordination of safety systems, financial trading, air traffic control, and everything in between. Linear algebra can be viewed either as the study of linear equations or as the study of vectors. It is tied to analytic geometry; practically speaking, this means that almost every fact you will learn in this course has a picture associated with it. Learning to connect the facts with their geometric interpretation will be very useful for you. The book which is used in the course focuses both on the theoretical aspects as well as the applied aspects of linear algebra. As a result, you will be able to learn the geometric interpretations of many of the algebraic concepts in this subject. Additionally, you will learn some standard techniques in numerical linear algebra, which allow you to deal with matrices that might show up in applications. Toward the end, the more abstract notions of vector spaces and linear transformations on vector spaces will be introduced. In college algebra, one becomes familiar with the equation of a line in twodimensional space: y = mx+b. Lines can be generalized to planes and “hyperplanes” in manydimensional space; these objects are all described by linear relations. Linear transformations are ways of rotating, dilating, or otherwise modifying the underlying space so that these linear objects are not deformed. Linear algebra, then, is the theory and practice of analyzing linear relations and their behavior under linear transformations. According to the second interpretation listed above, linear algebra focuses on vectors, which are mathematical objects in manydimensional space characterized by magnitude and direction. You can also think of them as a string of coordinates. Each string may represent the state of all the stocks traded in the DOW, the position of a satellite, or some other piece of data with multiple components. Linear transformations change the magnitude and direction of vectors—they transform the coordinates without changing their fundamental relationships with one another. Linear transformations are often written in a compact and easilyreadable way by using matrices. Linear algebra may at first seem dry and difficult to visualize, but it is one of the most useful subjects you can learn if you wish to become a businessperson, a physicist, a computerprogrammer, an engineer, or a mathematician. 
List of Free Online R Tutorials (Collection) 
According to the post on FREE online R tutorials from universities, I have received many email suggesting more and more tutorials. However some tutorials are not hosted in an academic institutes, so I decided to create this post to list such tutorials. 
M 

Machine Learning (Coursera) 
Machine learning is the science of getting computers to act without being explicitly programmed. In the past decade, machine learning has given us selfdriving cars, practical speech recognition, effective web search, and a vastly improved understanding of the human genome. Machine learning is so pervasive today that you probably use it dozens of times a day without knowing it. Many researchers also think it is the best way to make progress towards humanlevel AI. In this class, you will learn about the most effective machine learning techniques, and gain practice implementing them and getting them to work for yourself. More importantly, you’ll learn about not only the theoretical underpinnings of learning, but also gain the practical knowhow needed to quickly and powerfully apply these techniques to new problems. Finally, you’ll learn about some of Silicon Valley’s best practices in innovation as it pertains to machine learning and AI. This course provides a broad introduction to machine learning, datamining, and statistical pattern recognition. Topics include: (i) Supervised learning (parametric/nonparametric algorithms, support vector machines, kernels, neural networks). (ii) Unsupervised learning (clustering, dimensionality reduction, recommender systems, deep learning). (iii) Best practices in machine learning (bias/variance theory; innovation process in machine learning and AI). The course will also draw from numerous case studies and applications, so that you’ll also learn how to apply learning algorithms to building smart robots (perception, control), text understanding (web search, antispam), computer vision, medical informatics, audio, database mining, and other areas. 
Machine Learning (Coursera) 
Why write programs when the computer can instead learn them from data? In this class you will learn how to make this happen, from the simplest machine learning algorithms to quite sophisticated ones. Enjoy! 
Machine Learning (Coursera) 
Learn about the most effective machine learning techniques, and gain practice implementing them and getting them to work for yourself. 
Machine Learning (OpenClassroom) 
In this course, you’ll learn about some of the most widely used and successful machine learning techniques. You’ll have the opportunity to implement these algorithms yourself, and gain practice with them. You will also learn some of practical handson tricks and techniques (rarely discussed in textbooks) that help get learning algorithms to work well. This is an ‘applied’ machine learning class, and we emphasize the intuitions and knowhow needed to get learning algorithms to work in practice, rather than the mathematical derivations. Familiarity with programming, basic linear algebra (matrices, vectors, matrixvector multiplication), and basic probability (random variables, basic properties of probability) is assumed. Basic calculus (derivatives and partial derivatives) would be helpful and would give you additional intuitions about the algorithms, but isn’t required to fully complete this course. 
Machine Learning  Here is a great learning resource for anyone wishing to dive into the field of machine learning – a complete class “Machine Learning” from Spring 2011 at Carnegie Mellon University. 
Machine Learning 101  Great data science resources for selflearners 
Machine Learning: Reinforcement Learning (Udacity) 
Reinforcement Learning is the area of Machine Learning concerned with the actions that software agents ought to take in a particular environment in order to maximize rewards. You can apply Reinforcement Learning to robot control, chess, backgammon, checkers, and other activities that a software agent can learn. Reinforcement Learning uses behaviorist psychology in order to achieve reward maximization. This course includes important Reinforcement Learning approaches like Markov Decision Processes and Game Theory. Please refer to the Syllabus for a detailed breakdown of topics. 
Machine Learning: Supervised Learning (Udacity) 
This course covers Supervised Learning, a machine learning task that makes it possible for your phone to recognize your voice, your email to filter spam, and for computers to learn a bunch of other cool stuff. Supervised Learning is an important component of all kinds of technologies, from stopping credit card fraud, to finding faces in camera images, to recognizing spoken language. Our goal is to give you the skills that you need to understand these technologies and interpret their output, which is important for solving a range of data science problems. And for surviving a robot uprising. 
Machine Learning: Unsupervised Learning (Udacity) 
Ever wonder how Netflix can predict what movies you’ll like? Or how Amazon knows what you want to buy before you do? The answer can be found in Unsupervised Learning! Closely related to pattern recognition, Unsupervised Learning is about analyzing data and looking for patterns. It is an extremely powerful tool for identifying structure in data. This course focuses on how you can use Unsupervised Learning approaches — including randomized optimization, clustering, and feature selection and transformation — to find structure in unlabeled data. 
Mining Massive Datasets (Coursera) 
This class teaches algorithms for extracting models and other information from very large amounts of data. The emphasis is on techniques that are efficient and that scale well. http://www.mmds.org 
Model Building and Validation (Udacity) 
This course will teach you how to start from scratch in answering questions about the real world using data. Machine learning happens to be a small part of this process. The model building process involves setting up ways of collecting data, understanding and paying attention to what is important in the data to answer the questions you are asking, finding a statistical, mathematical or a simulation model to gain understanding and make predictions. All of these things are equally important and model building is a crucial skill to acquire in every field of science. The process stays true to the scientific method, making what you learn through your models useful for gaining an understanding of whatever you are investigating as well as make predictions that hold true to test. We will take you on a journey through building various models. This process involves asking questions, gathering and manipulating data, building models, and ultimately testing and evaluating them. 
Model Thinking (Coursera) 
We live in a complex world with diverse people, firms, and governments whose behaviors aggregate to produce novel, unexpected phenomena. We see political uprisings, market crashes, and a never ending array of social trends. How do we make sense of it? Models. Evidence shows that people who think with models consistently outperform those who don’t. And, moreover people who think with lots of models outperform people who use only one. Why do models make us better thinkers? Models help us to better organize information – to make sense of that fire hose or hairball of data (choose your metaphor) available on the Internet. Models improve our abilities to make accurate forecasts. They help us make better decisions and adopt more effective strategies. They even can improve our ability to design institutions and procedures. 
N 

Networked Life (Coursera) 
Networked Life will explore recent scientific efforts to explain social, economic and technological structures – and the way these structures interact – on many different scales, from the behavior of individuals or small groups to that of complex networks such as the Internet and the global economy. 
Neural Networks (Hugo Larochelle) 
This is a graduatelevel course, which covers basic neural networks as well as more advanced topics, including: • Deep learning. • Conditional random fields. • Restricted Boltzmann machines. • Autoencoders. • Sparse coding. • Convolutional networks. • Vector word representations. • and many more… 
Neural Networks for Machine Learning (Coursera) 
Learn about artificial neural networks and how they’re being used for machine learning, as applied to speech and object recognition, image segmentation, modeling language and human motion, etc. We’ll emphasize both the basic algorithms and the practical tricks needed to get them to work well. This course contains the same content presented on Coursera beginning in 2013. It is not a continuation or update of the original course. It has been adapted for the new platform. 
Neural Networks for Machine Learning (Coursera) 
Learn about artificial neural networks and how they’re being used for machine learning, as applied to speech and object recognition, image segmentation, modeling language and human motion, etc. We’ll emphasize both the basic algorithms and the practical tricks needed to get them to work well. 
O 

Online Experiments for Computational Social Science  This tutorial teaches attendees how to design, plan, implement, and analyze online experiments. First, we review basic concepts in causal inference and motivate the need for experiments. Then we will discuss basic statistical tools to help plan experiments: exploratory analysis, power calculations, and the use of simulation in R. We then discuss statistical methods to estimate causal quantities of interest and construct appropriate confidence intervals. Particular attention will be given to scalable methods suitable for ‘big data’, including working with weighted data and clustered bootstrapping. We then discuss how to design and implement online experiments using PlanOut, an opensource toolkit for advanced online experimentation used at Facebook. We will show how basic ‘A/B tests’, withinsubjects designs, as well as more sophisticated experiments can be implemented. We demonstrate how experimental designs from social computing literature can be implemented, and also review in detail two very large field experiments conducted at Facebook using PlanOut. Finally, we will discuss issues with logging and common errors in the deployment and analysis of experiments. Attendees will be given code examples and participate in the planning, implementation, and analysis of a Web application using Python, PlanOut, and R. 
OpenIntro – Statistics  The mission of OpenIntro is to make educational products that are free, transparent, and lower barriers to education. As a result, we have written an opensource (free) textbook that has been used at several universities, developed course management software that’s free for teachers and students alike, and continue to develop other unique, innovative products. 
P 

Passion Driven Statistics (Coursera) 
With existing data, you will develop skills in data analysis and basic statistics by exploring your own research question. 
Pattern Discovery in Data Mining (Coursera) 
Learn the general concepts of data mining along with basic methodologies and applications. Then dive into one subfield in data mining: pattern discovery. Learn indepth concepts, methods, and applications of pattern discovery in data mining. We will also introduce methods for patternbased classification and some interesting applications of pattern discovery. This course provides you the opportunity to learn skills and content to practice and engage in scalable pattern discovery methods on massive transactional data, discuss pattern evaluation measures, and study methods for mining diverse kinds of patterns, sequential patterns, and subgraph patterns. 
Practical Learning Analytics (Coursera) 
Everyone involved in higher education has questions. Students want to know how they’re doing and which classes they should take. Faculty members want to understand their students’ backgrounds and to learn whether their teaching techniques are effective. Staff members want to be sure the advice they provide is appropriate and find out whether college requirements accomplish their goals. Administrators want to explore how all of their students and faculty are doing and to anticipate emerging changes. The public wants to know what happens in college and why. 
Practical Machine Learning (Coursera) 
Learn the basic components of building and applying prediction functions with an emphasis on practical applications. This is the eighth course in the Johns Hopkins Data Science Specialization. 
Probabilistic Graphical Models (Coursera) 
In this class, you will learn the basics of the PGM representation and how to construct them, using both human knowledge and machine learning techniques. 
Probability (Coursera) 
The renowned mathematical physicist PierreSimon, marquis de Laplace wrote in his opus on probability in 1812 that ‘the most important questions of life are, for the most part, really only problems in probability’. His words ring particularly true today in this the century of ‘big data’. This introductory course takes us through the development of a modern, axiomatic theory of probability. But, unusually for a technical subject, the material is presented in its lush and glorious historical context, the mathematical theory buttressed and made vivid by rich and beautiful applications drawn from the world around us. The student will see surprises in electionday counting of ballots, a historical wager the sun will rise tomorrow, the folly of gambling, the sad news about lethal genes, the curiously persistent illusion of the hot hand in sports, the unreasonable efficacy of polls and its implications to medical testing, and a host of other beguiling settings. A curious individual taking this as a standalone course will emerge with a nuanced understanding of the chance processes that surround us and an appreciation of the colourful history and traditions of the subject. And for the student who wishes to study the subject further, this course provides a sound mathematical foundation for courses at the advanced undergraduate or graduate levels. 
Process Mining: Data science in Action (Coursera) 
Data science is the profession of the future, because organizations that are unable to use (big) data in a smart way will not survive. It is not sufficient to focus on data storage and data analysis. The data scientist also needs to relate data to process analysis. Process mining bridges the gap between traditional modelbased process analysis (e.g., simulation and other business process management techniques) and datacentric analysis techniques such as machine learning and data mining. Process mining seeks the confrontation between event data (i.e., observed behavior) and process models (handmade or discovered automatically). This technology has become available only recently, but it can be applied to any type of operational processes (organizations and systems). Example applications include: analyzing treatment processes in hospitals, improving customer service processes in a multinational, understanding the browsing behavior of customers using a booking site, analyzing failures of a baggage handling system, and improving the user interface of an Xray machine. All of these applications have in common that dynamic behavior needs to be related to process models. Hence, we refer to this as ‘data science in action’. The course explains the key analysis techniques in process mining. Participants will learn various process discovery algorithms. These can be used to automatically learn process models from raw event data. Various other process analysis techniques that use event data will be presented. Moreover, the course will provide easytouse software, reallife data sets, and practical skills to directly apply the theory in a variety of application domains. 
Programming for Everybody (Python) (Coursera) 
This course aims to teach everyone to learn the basics of programming computers using Python. The course has no prerequisites and avoids all but the simplest mathematics. Anyone with moderate computer experience should be able to master the materials in this course. 
PyCon 2015 ScikitLearn Tutorial (IPython) 
This is the main index of the PyCon 2015 Introduction to ScikitLearn tutorial, presented by Jake VanderPlas. The following links are to notebooks containing the tutorial materials. Note that many of these require files that are in the directory structure of the github repository in which they are contained. We will not have time to cover all this material, but I left it here for reference. 
Python Introduction  The Python Introduction tutorial explains how and where to start writing Python for the server side. 
R 

R for SAS, SPSS and STATA Users (Datacamp) 
If you already know SAS, SPSS or Stata, you don’t need to spend time learning how to analyze data; you need a course that focuses on translating your knowledge into R. This comprehensive course introduces R jargon using the language you’re familiar with. 
R Online Learning (RStudio Collection)  A wealth of tutorials, articles, and examples exist to help you learn R and its extensions. Scroll down or click a link below for a curated guide to learning R and its extensions. 
R Programming (Coursera) 
Learn how to program in R and how to use R for effective data analysis. This is the second course in the Johns Hopkins Data Science Specialization. 
R Tutorial (The Analysis Factor) 
The statistical programming language R is becoming a popular means for analyzing data. But it’s not always easy to use. We have a number of resources about learning and using R, including a severalpart tutorial blog series. 
R Tutorials  Basic R code examples. 
R: Getting Started with Data Science  This short tutorial will not only guide you through some basic data analysis methods but it will also show you how to implement some of the more sophisticated techniques available today. We will look into traffic accident data from the National Highway Traffic Safety Administration and try to predict fatal accidents using stateoftheart statistical learning techniques. 
Regression Models (Coursera) 
Learn how to use regression models, the most important statistical analysis tool in the data scientist’s toolkit. This is the seventh course in the Johns Hopkins Data Science Specialization. 
Reinforcement Learning (Udacity) 
You should take this course if you have an interest in machine learning and the desire to engage with it from a theoretical perspective. Through a combination of classic papers and more recent work, you will explore automated decisionmaking from a computerscience perspective. You will examine efficient algorithms, where they exist, for singleagent and multiagent planning as well as approaches to learning nearoptimal decisions from experience. At the end of the course, you will replicate a result from a published paper in reinforcement learning. 
Reinforcement Learning Lecture Videos  Lecture 1: Introduction to Reinforcement Learning Lecture 2: Markov Decision Process Lecture 3: Planning by Dynamic Programming Lecture 4: ModelFree Prediction Lecture 5: Model Free Control Lecture 6: Value Function Approximation Lecture 7: Policy Gradient Methods Lecture 8: Integrating Learning and Planning Lecture 9: Exploration and Exploitation 
Reporting with R Markdown (Datacamp) 
Write reports quickly and effectively with the R Markdown package. Generate reports straight from your R code, documenting your work – and its results – as an HTML, pdf, slideshow or Microsoft Word document. 
Reproducible Research (Coursera) 
Learn the concepts and tools behind reporting modern data analyses in a reproducible manner. This is the fifth course in the Johns Hopkins Data Science Specialization. 
S 

SAP HANA Academy (SAP) 
Welcome to the new landing page for SAP HANA Academy. The same great videos but with an easier to use interface with intuitive playlists and full text search. We now have over 800 free tutorial videos answering your questions on working with SAP HANA. 
Scalable Machine Learning (edx) 
Machine learning aims to extract knowledge from data, relying on fundamental concepts in computer science, statistics, probability and optimization. Learning algorithms enable a wide range of applications, from everyday tasks such as product recommendations and spam filtering to bleeding edge applications like selfdriving cars and personalized medicine. In the age of ‘Big Data’, with datasets rapidly growing in size and complexity and cloud computing becoming more pervasive, machine learning techniques are fast becoming a core component of largescale data processing pipelines. This course introduces the underlying statistical and algorithmic principles required to develop scalable realworld machine learning pipelines. We present an integrated view of data processing by highlighting the various components of these pipelines, including exploratory data analysis, feature extraction, supervised learning, and model evaluation. You will gain handson experience applying these principles using Apache Spark, a cluster computing system wellsuited for largescale machine learning tasks. You will implement scalable algorithms for fundamental statistical models (linear regression, logistic regression, matrix factorization, principal component analysis) while tackling key problems from domains such as online advertising and cognitive neuroscience. 
Scientific Computing (Coursera) 
Investigate the flexibility and power of projectoriented computational analysis, and enhance communication of information by creating visual representations of scientific data. 
Semantics Approach to Big Data and Event Processing  Variety, Velocity, Volume and Veracity are the four Vs for Big Data. Most of the technologies available have shown how to treat the Volume. However, due to the increasing number of streaming data sources, the Velocity problem is as relevant as never before. Moreover, Veracity and especially Variety problems have increased the difficulty of the challenge. This course focuses on two aspects of the Big Data problem, Velocity and Variety, and it shows how with streaming data and semantic technologies it is possible to enable efficient and effective stream processing for advanced application development. 
Short tutorials all data scientists should read (Collection) 
The links to core data science concepts 
Social and Economic Networks: Models and Analysis (Coursera) 
Learn how to model social and economic networks and their impact on human behavior. How do networks form, why do they exhibit certain patterns, and how does their structure impact diffusion, learning, and other behaviors? We will bring together models and techniques from economics, sociology, math, physics, statistics and computer science to answer these questions. 
Social Network Analysis (Coursera) 
This course will use social network analysis, both its theory and computational tools, to make sense of the social and information networks that have been fueled and rendered accessible by the internet. 
Statistical Data Mining Tutorials  by Andrew W. Moore 
Statistical Inference (Coursera) 
Learn how to draw conclusions about populations or scientific truths from data. This is the sixth course in the Johns Hopkins Data Science Course Track. 
Statistical Learning (Stanford) 
This is an introductorylevel course in supervised learning, with a focus on regression and classification methods. The syllabus includes: linear and polynomial regression, logistic regression and linear discriminant analysis; crossvalidation and the bootstrap, model selection and regularization methods (ridge and lasso); nonlinear models, splines and generalized additive models; treebased methods, random forests and boosting; supportvector machines. Some unsupervised learning methods are discussed: principal components and clustering (kmeans and hierarchical). This is not a mathheavy class, so we try and describe the methods without heavy reliance on formulas and complex mathematics. We focus on what we consider to be the important elements of modern data analysis. Computing is done in R. There are lectures devoted to R, giving tutorials from the ground up, and progressing with more detailed sessions that implement the techniques in each chapter. 
Statistics II (Saylor) 
This course will introduce you to a number of statistical tools and techniques that are routinely used by modern statisticians for a wide variety of applications. First, we will review basic knowledge and skills that you learned in MA121: Introduction to Statistics. Units 25 will introduce you to new ways to design experiments and to test hypotheses, including multiple and nonlinear regression and nonparametric statistics. You will learn to apply these methods to building models to analyze complex, multivariate problems. You will also learn to write scripts to carry out these analyses in R, a powerful statistical programming language. The last unit is designed to give you a grand tour of several advanced topics in applied statistics. 
Statistics One (Coursera) 
Statistics One is a comprehensive yet friendly introduction to statistics. 
Statistics: Making Sense of Data (Coursera) 
This course is an introduction to the key ideas and principles of the collection, display, and analysis of data to guide you in making valid and appropriate conclusions about the world. 
statsTeachR  statsTeachR is an openaccess, online repository of modular lesson plans, a.k.a. “modules”, for teaching statistics using R at the undergraduate and graduate level. Each module focuses on teaching a specific statistical concept. The modules range from introductory lessons in statistics and statistical computing to more advanced topics in statistics and biostatistics. 
Survey Analysis in R (statistics.com) 
The purpose of this online course, “Survey Analysis in R” is to teach survey researchers who are familiar with R how to use it in survey research. The course uses the Survey package for R, which was created by the instructor. You will learn how to describe to R the design of a survey; both simple and complex designs are covered. You will then learn how to get R to produce descriptive statistics and graphs with teh survey data, and also to perform regression analysis on the data. 
T 

Tackle Real Data Challenges (Coursera) 
Learn scalable data management, evaluate big data technologies, and design effective visualizations. This Specialization covers intermediate topics in data science. You will gain handson experience with scalable SQL and NoSQL data management solutions, data mining algorithms, and practical statistical and machine learning concepts. You will also learn to visualize data and communicate results, and you’ll explore legal and ethical issues that arise in working with big data. In the final Capstone Project, developed in partnership with the digital internship platform Coursolve, you’ll apply your new skills to a realworld data science project. 
Text Mining and Analytics (Coursera) 
This course will cover the major techniques for mining and analyzing text data to discover interesting patterns, extract useful knowledge, and support decision making, with an emphasis on statistical approaches that can be generally applied to arbitrary text data in any natural language with no or minimum human effort. Detailed analysis of text data requires understanding of natural language text, which is known to be a difficult task for computers. However, a number of statistical approaches have been shown to work well for the “shallow” but robust analysis of text data for pattern finding and knowledge discovery. You will learn the basic concepts, principles, and major algorithms in text mining and their potential applications. 
Text Retrieval and Search Engines (Coursera) 
Recent years have seen a dramatic growth of natural language text data, including web pages, news articles, scientific literature, emails, enterprise documents, and social media such as blog articles, forum posts, product reviews, and tweets. Text data are unique in that they are usually generated directly by humans rather than a computer system or sensors, and are thus especially valuable for discovering knowledge about people’s opinions and preferences, in addition to many other kinds of knowledge that we encode in text. This course will cover search engine technologies, which play an important role in any data mining applications involving text data for two reasons. First, while the raw data may be large for any particular problem, it is often a relatively small subset of the data that are relevant, and a search engine is an essential tool for quickly discovering a small subset of relevant text data in a large text collection. Second, search engines are needed to help analysts interpret any patterns discovered in the data by allowing them to examine the relevant original text data to make sense of any discovered pattern. You will learn the basic concepts, principles, and the major techniques in text retrieval, which is the underlying science of search engines. 
The Data Scientist’s Toolbox (Coursera) 
Get an overview of the data, questions, and tools that data analysts and data scientists work with. This is the first course in the Johns Hopkins Data Science Specialization. 
Two minute Videos: How to do stuff in R  For those of us who prefer to learn by watching and listening. 
U 

Unsupervised Feature Learning and Deep Learning (OpenClassroom) 
Machine learning has seen numerous successes, but applying learning algorithms today often means spending a long time handengineering the input feature representation. This is true for many problems in vision, audio, NLP, robotics, and other areas. In this course, you’ll learn about methods for unsupervised feature learning and deep learning, which automatically learn a good representation of the input from unlabeled data. You’ll also pick up the ‘handson,’ practical skills and tricksofthetrade needed to get these algorithms to work well. Basic knowledge of machine learning (supervised learning) is assumed, though we’ll quickly review logistic regression and gradient descent. 
Unsupervised Feature Learning and Deep Learning Tutorial  This tutorial will teach you the main ideas of Unsupervised Feature Learning and Deep Learning. By working through it, you will also get to implement several feature learning/deep learning algorithms, get to see them work for yourself, and learn how to apply/adapt these ideas to new problems. This tutorial assumes a basic knowledge of machine learning (specifically, familiarity with the ideas of supervised learning, logistic regression, gradient descent). If you are not familiar with these ideas, we suggest you go to this Machine Learning course and complete sections II, III, IV (up to Logistic Regression) first. 
W 

Web Intelligence and Big Data (Coursera) 
This course is about building ‘webintelligence’ applications exploiting big data sources arising social media, mobile devices and sensors, using new bigdata platforms based on the ‘mapreduce’ parallel programming paradigm. In the past, this course has been offered at the Indian Institute of Technology Delhi as well as the Indraprastha Institute of Information Technology Delhi. 
Working with the Predictive Analysis Library (SAP) 
In this section, we examine some of the specific analytic functions available within the SAP HANA Predictive Analysis Library. 
Advertisements