I have never been formally trained on how to deal with seasonality. But I wanted to take a moment to share my perspective based on experience, which I hope readers will find fairly straightforward. Some people use sales revenues in order to evaluate seasonal differences. I find it more desirable to analyze units sold if possible. A price increase resulting in slightly higher revenues does not in itself represent increased demand. Nor should discounted prices leading to reduced revenues necessarily be regarded as reduced demand. Below I present some fictitious data. I offer this data as a controlled sample. I would expect an objective person to say, ‘This product is selling poorly.’ Although the data is clearly fabricated, I will make some cosmetic adjustments in a moment to obscure the fine details.
In the time it took you to read this sentence, terabytes of data have been collectively generated across the world — more data than any of us could ever hope to process, much less make sense of, on the machines we’re using to read this notebook. In response to this massive influx of data, the field of Data Science has come to the forefront in the past decade. Cobbled together by people from a diverse array of fields — statistics, physics, computer science, design, and many more — the field of Data Science represents our collective desire to understand and harness the abundance of data around us to build a better world. In this notebook, I’m going to go over a basic Python data analysis pipeline from start to finish to show you what a typical data science workflow looks like. In addition to providing code examples, I also hope to imbue in you a sense of good practices so you can be a more effective — and more collaborative — data scientist. I will be following along with the data analysis checklist from The Elements of Data Analytic Style, which I strongly recommend reading as a free and quick guidebook to performing outstanding data analysis.
In this post I’m going to talk about something that’s relatively simple but fundamental to just about any business: Customer Segmentation. At the core of customer segmentation is being able to identify different types of customers and then figure out ways to find more of those individuals so you can… you guessed it, get more customers! In this post, I’ll detail how you can use K-Means clustering to help with some of the exploratory aspects of customer segmentation.
Despite having done it countless times, I regularly forget how to build a cohort analysis with Python and pandas. I’ve decided it’s a good idea to finally write it out – step by step – so I can refer back to this post later on. Hopefully others find it useful as well. I’ll start by walking through what cohort analysis is and why it’s commonly used in startups and other growth businesses. Then, we’ll create one from a standard purchase dataset.
In the last post, I went over the basics of lists, including constructing, manipulating, and converting lists to other classes. Knowing the basics, in this post, we’ll use the apply() functions to see just how powerful working with lists can be. I’ve done two posts on apply() for dataframes and matrics, here and here, so give those a read if you need a refresher.
In my last couple of articles (Part 4, Part 5), I demonstrated a logistic regression model with binomial errors on binary data in R’s glm() function. But one of wonderful things about glm() is that it is so flexible. It can run so much more than logistic regression models. The flexibility, of course, also means that you have to tell it exactly which model you want to run, and how. In fact, we can use generalized linear models to model count data as well.