An illustrated introduction to the t-SNE algorithm
This post is an introduction to a popular dimensonality reduction algorithm: t-distributed stochastic neighbor embedding (t-SNE). Developed by Laurens van der Maaten and Geoffrey Hinton, this algorithm has been successfully applied to many real-world datasets. Here, we’ll follow the original paper and describe the key mathematical concepts of the method, when applied to a toy dataset (handwritten digits). We’ll use Python and the scikit-learn library.
A Big Data Cheat Sheet: What Executives Want to Know
1. What can Hadoop do that my data warehouse can’t?
2. We’re not doing ‘big’ data, so why do we need Hadoop?
3. Is Hadoop enterprise-ready?
4. Isn’t a data lake just the data warehouse revisited?
5. What are some of the pros and cons of a data lake?
Simple Data Science To Maximize Return On Lottery Investment
I read recently this amazing book, where I discovered that we (humans) are not capable of generating random sequences of numbers by ourselves when we play lottery. John Haigh demonstrates this fact analyzing a sample of 282 raffles of 6/49 UK Lotto. Once I read this, I decided to prove if this disability is property only of British population or if it is shared with Spanish people as well. I am Spanish, so this experiment can bring painful results to myself, but here I come.
Mail merge’ with RMarkdown
I was working on creating personalized handouts for a workshop. That is, each handout contained some standard text (including some R code) and some fields that were personalized for each participant (login information for our RStudio server). I wanted to do this in RMarkdown so that the R code on the handout could be formatted nicely. Googling ‘rmarkdown mail merge’ didn’t yield much (that’s why I’m posting this), but I finally came across this tutorial which called the process ‘iterative reporting’.