Build a Predictive Model in 10 Minutes (using Python)

Last week, we published “Perfect way to build a Predictive Model in less than 10 minutes using R“. Any one can guess a quick follow up to this article. Given the rise of Python in last few years and its simplicity, it makes sense to have this tool kit ready for the Pythonists in the data science world. I will follow similar structure as previous article with my additional inputs at different stages of model building. These two articles will help you to build your first predictive model faster with better power. Most of the top data scientists and Kagglers build their first effective model quickly and submit. This not only helps them get a head start on the leader board, but also provides a bench mark solution to beat.

Forecasting Time Series Data with Multiple Seasonal Periods

Time series data is produced in domains such as IT operations, manufacturing, and telecommunications. Examples of time series data include the number of client logins to a website on a daily basis, cell phone traffic collected per minute, and temperature variation in a region by the hour. Forecasting a time series signal ahead of time helps us make decisions such as planning capacity and estimating demand. Previous time series analysis blog posts focused on processing time series data that resides on Greenplum database using SQL functions. In this post, I will examine the modeling steps involved in forecasting a time series sequence with multiple seasonal periods.

Convergence and Asymptotic Results

Visualize the law of large numbers.

Latent Hierarchical Model for Activity Recognition

Open source software for human activity recognition using RGB-D sensors.

2015 Data Science Salary Survey

Now in its third edition, the 2015 version of the Data Science Salary Survey explores patterns in tools, tasks, and compensation through the lens of clustering and linear models. The research is based on data collected through an online 32-question survey, including demographic information, time spent on various data-related tasks, and the use/non-use of 116 software tools. Over 600 respondents from a variety of industries completed the survey, two-thirds of whom are based in the United States.

subsetting data in ggtree

Subsetting is commonly used in ggtree as we would like to for example separating internal nodes from tips. We may also want to display annotation to specific node(s)/tip(s). – See more at: http://…/#sthash.jte8vsfA.dpuf

Interpolation and smoothing functions in base R

Simulating backtests of stock returns using Monte-Carlo and snowfall in parallel

You could say that the following post is an answer/comment/addition to Quintuitive, though I would consider it as a small introduction to parallel computing with snowfall using the thoughts of Quintuitive as an example.

Game-Changing Real-time Uses for Apache Spark

Apache Spark, hosted on Hadoop, is great for processing large amounts of data quickly, but wouldn’t it be even better if you could process data in real time? If your business depends on making decisions quickly, you should definitely consider the MapR distribution, which includes the complete Spark stack including Spark Streaming. Here are some amazing, of-the-moment, game-changing uses for real-time Big Data processing.