Calculate the Standard Deviation of Sales Data

3 min readSep 13, 2024

Hey, guys! Welcome to the 4th Problem-Solving Article in our Data Science and Analytics Series. In today’s article, we’ll solve the Standard Deviation of Sales Data Problem using Python.

So, let’s see our Problem-Statement now.

Problem 3: Standard Deviation

You have been given the following dataset representing the number of products sold over five days:

Calculate the standard deviation of the number of products sold.

Explanation

Okay, here we’re given thea dataset that represents the number of products sold over five days. Our task is to calculate the Standard Deviation of this Dataset. We’ll do it in 3 different methods and also discuss which one is more acurate and efficient.

But, before we directly jump into the solution, let’s breifly understand what actually this Standard Deviation is?

Standard Deviation

Standard Deviation is a measure of the Amount of Variation or Dispersion in a Set of Values. If I explain it in simple terms, it tells us that in our dataset, how much the values Deviate from the Mean/Average value.

A low Standard Deviation means the values are close to the Mean. While a high Standard Deviation shows that the values are spread out over a wider range.

Great! If you want, you can also watch the Video Solution: 👇

How to Calculate the Standard Deviation of a Dataset?

Alright! Let’s get into it and start writing our code now.

Solution

The very first thing here we can do is to store the sales figures in a list.

sales = [20, 35, 30, 25, 40]

1. Statistics

Let’s first import our statistics module.

# Method 1: Statistics
import statistics as st

We have a stdev() method in Statistics to calculate the Standard Deviation.

stdVal = st.stdev(sales)
print(stdVal)  # Output: 7.905694150420948

And here we go! We got our Standard Deviation value which is approx. 7.906.

2. Pandas

Pandas is a popular Library for Data Manipulation and Analysis. Here, we’ll first convert our dataset as a Pandas Series.

# Method 2: Pandas
import pandas as pd

salesSeries = pd.Series(sales)

Now, we can use the std() method to calculate the Standard Deviation.

stdVal = salesSeries.std()
print(stdVal)  # Output: 7.905694150420948

And here again, we got the Standard Deviation value of approx. 7.906. We’re getting accurate results so far.

3. NumPy

NumPy is a popular Python Library which we use for Numerical Computation. Let’s import it first.

# Method 3: NumPy
import numpy as np

In NumPy, we have a std() method, that we can use to calculate the Standard Deviation.

stdVal = np.std(sales)
print(stdVal)  # Output: 7.0710678118654755

Ohh! Here, we are not getting the value we were getting earlier. It’s approx. 7.071. But in our previous approaches, we got a value of approx. 7.906. What went wrong then?

Well, this is because, when we try to calculate the Standard Deviation without Bessel’s Correction, it calculates for the entire population [it divides by n, and not by (n-1)].

This is just becasue of the different approaches we used to calculate the Standard Deviation. That’s the reason we get a slighly different calculation here. If we want to calculate it for the sample data, we can use Degrees of Freedom here so that it ensures to apply Bessel’s Correction.

stdVal = np.std(sales, ddof=1)
print(stdVal)  # Output: 7.905694150420948

When Bessel’s Correction is applied, it now divides by (n-1), that’s why we are now getting the same result, we were getting earlier. We got a value of approx. 7.906 now.

I hope there is no more confusion on the different Standard Deviation. As I have already said it’s becasue the different approaches.

And that’s all for today’s article. I hope this article was helpful. If you get any doubt, you can ask me.

Thanks for reading! 😊