Learn Gradient Boosting Algorithm for better predictions (with codes in R)

The accuracy of a predictive model can be boosted in two ways: Either by embracing feature engineering or by applying boosting algorithms straight away. Having participated in lots of data science competition, I’ve noticed that people prefer to work with boosting algorithms as it takes less time and produces similar results. There are multiple boosting algorithms like Gradient Boosting, XGBoost, AdaBoost, Gentle Boost etc. Every algorithm has its own underlying mathematics and a slight variation is observed while applying them. If you are new to this, Great! You shall be learning all these concepts in a week’s time from now. In this article, I’ve explained the underlying concepts and complexities of Gradient Boosting Algorithm. In addition, I’ve also shared an example to learn its implementation in R.

Hypothesis Driven Development Part III: Monte Carlo In Asset Allocation Tests

This post will show how to use Monte Carlo to test for signal intelligence. Although I had rejected this strategy in the last post, I was asked to do a monte-carlo analysis of a thousand random portfolios to see how the various signal processes performed against said distribution. Essentially, the process is quite simple: as I’m selecting one asset each month to hold, I simply generate a random number between 1 and the amount of assets (5 in this case), and hold it for the month. Repeat this process for the number of months, and then repeat this process a thousand times, and see where the signal processes fall across that distribution.

The New Microsoft Data Science User Group Program

We are very pleased to announce that Microsoft will not only continue the Revolution Analytics’ tradition of supporting R user groups worldwide, but is expanding the scope of the user group program. The new 2016 Microsoft Data Science User Group Sponsorship Program is open to all user groups that are passionate about open-source data science technologies. If your group is focused on R, Python, Apache Hadoop or some other vital data science technology you may qualify for the Microsoft program. The major criteria for participation are that you have a public web presence, you hold meetings on a regular basis, and the you can demonstrate your commitment to furthering and contributing to your particular corner of the open-source data science community.

Fitting Polynomial Regression in R

A linear relationship between two variables x and y is one of the most common, effective and easy assumptions to make when trying to figure out their relationship. Sometimes however, the true underlying relationship is more complex than that, and this is when polynomial regression comes in to help.

When is a Backtest Too Good to be True?

One statistic which I find useful to form a first impression of a backtest is the success/winning percentage. Since it can mean different things, let’s be more precise: for a strategy over daily data, the winning percentage is the percentage of the days on which the strategy had positive returns (in other words, the strategy guessed the sign of the return correctly on these days). Now the question – if I see 60% winning percentage for a S&P 500 strategy, does/should my bullshit-alarm go off?

Analyze Data: Five Ways You Can Make Interactive Maps

Plotly’s new map making tools let you tell stories about data as it relates to geography. This post shows five examples of how you can make and style choropleth, subplot, scatter, bubble, and line maps. We made these maps with our APIs for R and Python. In the future, we will also support maps from our web app. Let us know if you have suggestions or feedback. You can integrate your maps with dashboards, IPython Notebooks, Shiny, PowerPoint, reports, and databases. For users who want to securely share graphs and data within a team and make interactive dashboards, contact us about Plotly on-premise.

SPARQL is the new King of all Data Scientist’s tools

Inspired by the development of semantic technologies in recent years, in statistical analysis field the traditional methodology of designing, publishing and consuming statistical datasets is evolving to so-called “Linked Statistical Data” by associating semantics with dimensions, attributes and observation values based on Linked Data design principles.

A Great way to learn Data Science by simply doing it

There are tons of great online resources out there we can pick up and learn them to become a master in data science. Here is a comprehensive list of data science course providers along with links to the data science courses.