Difference Between Correlation and Regression in Statistics

Correlation and Regression are the two analysis based on multivariate distribution. A multivariate distribution is described as a distribution of multiple variables. Correlation is described as the analysis which lets us know the association or the absence of the relationship between two variables ‘x’ and ‘y’. On the other end, Regression analysis, predicts the value of the dependent variable based on the known value of the independent variable, assuming that average mathematical relationship between two or more variables. The difference between correlation and regression is one of the commonly asked questions in interviews. Moreover, many people suffer ambiguity in understanding these two. So, take a full read of this article to have a clear understanding on these two.

Modern Approaches for Sales Predictive Analytics

Sales prediction is an important part of modern business intelligence. First approaches one can apply to predict sales time series are such conventional methods of forecasting as ARIMA and Holt-Winters. But there are several challenges while using these methods. They are: multilevel daily/weekly/monthly/yearly seasonality, many exogenous factors which impact sales, complex trends in different time periods. In such cases, it is not easy to apply conventional methods. Of course, there is the implementation of SARIMA for time series with seasonality, and SARIMAX for time series with seasonality and exogenous factors. But in these implementations, we can deal with simple seasonality with one time period and exogenous factors which are treated as covariates for residuals. Also, in these implementations, it is not easy to apply different types of regularizations to avoid overfitting or take into account feature interaction. The other main problem is that in some real cases of store sales we do not have an efficient number of historical time series values, e.g. in case when a store has been opened recently or has been acquired recently by a store chain. The lack of historical sales values can appear in case when a new product has been launched recently. How can we predict sales in such cases? So, in general, sales prediction can be a hard complex problem. To get insights and to find new approaches, some companies propose such type of problems for data science competitions, e.g. at Kaggle. The company Grupo Bimbo organized Kaggle competition Grupo Bimbo Inventory Demand. In this competition, Grupo Bimbo invited Kagglers to develop a model to forecast accurately the inventory demand based on historical sales data. I had a pleasure to be a teammate of a great team “The Slippery Appraisals” which won this competition among nearly two thousand teams. We proposed the best scored solution for sales prediction in more than 800,000 stores for more than 1000 products.

Data Scientists 4.0

The 4th Industrial Revolution was publicly announced in 2011 at the Hannover Fair (1). Since then, many resources have been appeared around the so called Industry 4.0. Elements such as the Digital Twins, Industrial Internet of Things or Cyber Physical Systems have came into the scene as unseparated elements, providing the necessary ingredients for a paradigm shift in many manufacturing areas. Over all these components, the most pioneering are related with the predictive analytics and artificial intelligence, directly applied to real use cases without resolution just a few years ago. The predictive maintenance, the artificial vision or the pattern recognition mechanisms to identify potential failures on real time are some examples of many use cases that are being applied in the new industry.

Hierarchical Classification – a useful approach when predicting thousands of possible categories

Traditionally, most of the multi-class classification problems (i.e. problems where you want to predict where a given sample falls into, from a set of possible results) focus on a small number of possible predictions. For most purposes, whether teaching data science or dealing with a lot of real-life scenarios, this would be ok. These kinds of scenarios include the typical examples of classifying a given e-mail as Spam/Legitimate, classifying an image of a skin mole as being a melanoma/normal, or the music genre of some song playing on the radio.