Open Data Science Conference
The #ODSc Conference brings together most influential practitioners, innovators, and thought leaders in the open source and data science fields to discover, develop, and accelerate the adoption of open source in data science. Open source data science is revolutionizing the way we analyze information, and to tap into the latest innovations and opportunities, you will want to join us.
Hacker’s guide to Neural Networks
Patterns for Information Visualization
About 2 years ago I’ve become very much interested in UX and everything about UX. This interest has eventually evolved into strong awareness of presenting information, visually in particular. One just can’t help thinking in terms of visualization when reading Tufte, Cleveland and Berten. Ideas come pouring in all the time: how to make things more visual, easier to grasp, more clear (in our product, in particular). I will try to share this feeling and tell more about the principles of information visualization based on some very impressive stories. I beg your pardon for a couple or two boring definitions. There’re no jokes in this article, intentionally. It’s a deadly serious business, so scrape up all your patience and read on. Disclaimer: the article is quite lengthy. Even so, it’s my sincere hope that you will get over it in no time, as I can’t stress enough how engaging and fascinating the subject is.
Simple solutions to make videos with R
I’m talking about streaming data displayed in video rather than chart format, like 200 scatter plots continuously updated, as in my recent video series from chaos to clusters. In this article, I explain and illustrate how to produce these videos. You don’t need to be a data scientist to understand.
Can you Be a Growth Hacker Without Being a Data Scientist?
Growth Hacking is turning out to be one of the hottest growing fields for data analysts & scientists. Although, there is controversy about the term & the specific meaning, the general connotation implies a function, activity or person which is primarily focused on growing a set of metrics such as users, revenue, visits & profits.
Decision tree vs. linearly separable or non-separable pattern
As a part of a series of posts discussing how a machine learning classifier works, I ran decision tree to classify a XY-plane, trained with XOR patterns or linearly separable patterns.
Predicting Car Prices Part 1: Linear Regression
Let’s walk through an example of predictive analytics using a data set that most people can relate to:prices of cars. In this case, we have a data set with historical Toyota Corolla prices along with related car attributes.
Predicting Car Prices Part 2: Using Neural Network
This is part two of the series. In part one, we used linear regression model to predict the prices of used Toyota Corollas. There are some overlap in the materials for those just reading this post for the first time. For those who read the part 1 of the series using linear regression, then you can safely skip to the section where I applied neural networks to the same data set. In this post, we will use neural networks! Skip to the Nueral Network analysis section if you’ve read part 1 of this series.
A function to help graphical model checks of lm and ANOVA
Even if LM are very simple models at the basis of many more complex ones, LM still have some assumptions that if not met would render any interpretation from the models plainly wrong. In my field of research most people were taught about checking ANOVA assumptions using tests like Levene & co. This is however not the best way to check if my model meet its assumptions as p-values depend on the sample size, with small sample size we will almost never reject the null hypothesis while with big sample even small deviation will lead to significant p-values (discussion). As ANOVA and linear models are two different ways to look at the same model (explanation) we can check ANOVA assumptions using graphical check from a linear model. In R this is easily done using plot(model), but people often ask me what amount of deviation makes me reject a model. One easy way to see if the model checking graphs are off the charts is to simulate data from the model, fit the model to these newly simulated data and compare the graphical checks from the simulated data with the real data. If you cannot differentiate between the simulated and the real data then your model is fine, if you can then try again!
NASDAQ 100 Couples
Today, my experiment deals with quantmod package, which allows you to play to be quant for a while. I download the daily quotes of NASDAQ 100 companies and measure distances between each pair of companies. Distance is based on the cross-correlation between two series so high-correlated series (not exceeding a maximum lag) are closer than low-correlated ones. You can read a good description of this distance here. Since NASDAQ 100 contains 107 companies, I calculate distances for 5.671 different couples.
Introducing: Machine Learning in R
Machine learning is a branch in computer science that studies the design of algorithms that can learn. Typical machine learning tasks are concept learning, function learning or ‘predictive modeling’, clustering and finding predictive patterns. These tasks are learned through available data that were observed through experiences or instructions, for example. Machine learning hopes that including the experience into its tasks will eventually improve the learning. The ultimate goal is to improve the learning in such a way that it becomes automatic, so that humans like ourselves don’t need to interfere any more. Machine learning has close ties with Knowledge Discovery, Data Mining, Artificial Intelligence and Statistics. Typical applications of machine learning can be classified into scientific knowledge discovery and more commercial applications, ranging from the ‘Robot Scientist’ to anti-spam filtering and recommender systems. This small tutorial is meant to introduce you to the basics of machine learning in R: it will show you how to use R to work with the well-known machine learning algorithm called ‘KNN’ or k-nearest neighbors.