Naive Bayes in Python
Naive Bayes classification is a simple, yet effective algorithm. It’s commonly used in things like text analytics and works well on both small datasets and massively scaled out, distributed systems.

The Current State of Machine Intelligence
I spent the last three months learning about every artificial intelligence, machine learning, or data related startup I could find?—?my current list has 2,529 of them to be exact. Yes, I should find better things to do with my evenings and weekends but until then…

Visualizing Optimization Algos
Algos without scaling based on gradient information really struggle to break symmetry here – SGD gets no where and Nesterov Accelerated Gradient / Momentum exhibits oscillations until they build up velocity in the optimization direction. Algos that scale step size based on the gradient quickly break symmetry and begin descent.

Java: Deeplearning4j Quickstart: Run Your First Examples In Minutes
This screencast explains how to run your first examples with It’s pretty easy.

Data Animations With Python and MoviePy
Python has some great data vizualization librairies, but few can render GIFs or video animations. This post shows how to use MoviePy as a generic animation plugin for any other library.

2015 Predictions – What’s Next for Data Scientists?
What’s next for data scientists in 2015 – new areas they will focus on – cyber threats to fraud detection – and how the expectations for this profession will change.

New Research Shows Businesses are Investing Heavily in Big Data Analytics
Big data is more than a buzzword, as proven by how fast organizations are adopting new analytics technologies to obtain business value from it. That is the key takeaway from a Luth Research survey of large organizations currently using big data analytics software or planning to use it in the next 12 months.

New ASA Guidelines for Undergraduate Statistics Programs
The report places good statistical practice firmly on the foundation of the scientific method and locates statistical knowledge and skills squarely in the center modern data analysis.

Big Data: The Key Vocabulary Everyone Should Understand
The field of Big Data requires more clarity and I am a big fan of simple explanations. This is why I have attempted to provide simple explanations for some of the most important technologies and terms you will come across if you’re looking at getting into big data.

10 data science predictions for 2015
These predictions were published by the International Institute for Analytics (IIA).
1. Organizations will clarify CAO and CDO roles and how they fit into their overall staffing structure.
2. Storytelling will be the hot new job in analytics.
3. Ensemble methods for analytical models will grow in popularity.
4. The application of analytics for holistic and integrated security breach prevention will be a top priority.
5. We will see the emergence of “The Analytics of Things.”
6. Companies will double their investment in generating new and unique data.
7. Hadoop will go mainstream.
8. Privacy demands will spark tools and services that allow consumers to determine if and how their data is shared and at what price.
9. Analytics, machine learning, and cognitive computing will increasingly take over the jobs of knowledge workers.
10. Automated decision-making will come of age in 2015.

Discover, Access, Distill: The Essence of Data Science
I coined this motto – the fact that data science is Discover, Access, Distill, back in October 2013. It looks like this is becoming a popular concept. It is the title of Fred Cadena’s blog. It is also the subject of an article on, and published in Vincent Granville’s Wiley book (page 9). Anyway, for those interested, here’s the original discussion:

Introduction to Data Analysis
This course is an introduction to analyzing data with the R software. It features some mathematics and statistics as well as some statistical computing and data visualization. You will need a laptop with an Internet connection to follow the class. To get started, download the entire course. To take a look at what the course material is made of, view it on GitHub first. It’s not a large download.

I need answers now! – Using simulation to jump-start an experiment
Say for the moment you’re building a website serving some kind of service to millions of users. You know all about progressive refinement using A/B testing methodologies, and you’ve found some easy wins by tweaking various features on the site and observing the results. Maybe you even launched some completely new user interaction mechanics as an experiment and verified that they improve key metrics. But now you have a proposed feature that your lead engineer thinks will increase engagement by a wide margin and your lead designer things will hurt engagement irreparably. What do you do?
It is tempting to say “let’s run the experiment anyway.” You can roll it out to just a small fraction of the user base, and you can always pull the plug on the experiment if key metrics take a turn for the worse, right?

DeepPy: Deep learning in Python
DeepPy tries to combine state-of-the-art deep learning models with a Pythonic interface in an extensible framework.

Finding similar neighborhoods across cities
Our friend Oliver lives in London, where he works as a consultant for a big financial company. Occasionally, he takes a business trip to another major city, to seal a major deal, and make major buck for his boss. Of course, Oliver being Oliver, he always finds the time to enjoy whatever city he happens to be. When in London, he likes to suit up and hang out in Soho, a “predominantly fashionable district of upmarket restaurants.” He would like to do that also in Rome, where he’s flying to next week, but he doesn’t know much about that city. Where is the Soho of Rome? What neighborhood of Rome is most similar to Soho?