Introduction to Regression Splines (with Python codes)

As a beginner in the world of data science, the first algorithm I was introduced to was Linear Regression. I applied it to different datasets and noticed both it’s advantages and limitations. It assumed a linear relationship between the dependent and independent variables, which was rarely the case in reality. As an improvement over this model, I tried Polynomial Regression which generated better results (most of the time). But using Polynomial Regression on datasets with high variability chances to result in over-fitting.

Distributed Deep Learning with Polyaxon

In this short tutorial, we will be going over a new feature in Polyaxon, distributed training. Polyaxon currently supports and simplifies distributed training on the following frameworks: Tensorflow, MXNet, and Pytorch. To parallelize your computations across processes and clusters of machines, you need to adapt your code and update your polyaxonfile to specify the cluster definition. Polyaxon then takes care of creating the tasks and exports necessary environment variables to enable the distributed training.

Big Data Tech Hadoop and Spark Get Slow Start in Enterprise

There are plenty of success stories about Hadoop in the enterprise, but those may be the exception and not the rule. Gartner Research VP Merv Adrian provides a reality check on deployment rates, successes, and failures of big data technologies in the enterprise.

Descriptive Statistics: The Mighty Dwarf of Data Science

No other mean of data description is more comprehensive than Descriptive Statistics and with the ever increasing volumes of data and the era of low latency decision making needs, its relevance will only continue to increase.

Exploring the underlying theory of the chi-square test through simulation – part 1

Kids today are so sophisticated (at least they are in New York City, where I live). While I didn’t hear about the chi-square test of independence until my first stint in graduate school, they’re already talking about it in high school. When my kids came home and started talking about it, I did what I usually do when they come home asking about a new statistical concept. I opened up R and started generating some data. Of course, they rolled their eyes, but when the evening was done, I had something that might illuminate some of what underlies the theory of this ubiquitous test. Actually, I created enough simulations to justify two posts – so this is just part 1, focusing on the ?2\chi^2? ?2 ?? distribution and its relationship to the Poisson distribution. Part 2 will consider contingency tables, where we are often interested in understanding the nature of the relationship between two categorical variables. More on that the next time.