Calculate the Standard Deviation of Sales Data
Hey, guys! Welcome to the 4th Problem-Solving Article in our Data Science and Analytics Series. In today’s article, we’ll solve the Standard Deviation of Sales Data Problem using Python.
So, let’s see our Problem-Statement now.
Problem 3: Standard Deviation
You have been given the following dataset representing the number of products sold over five days:
Calculate the standard deviation of the number of products sold.
Explanation
Okay, here we’re given thea dataset that represents the number of products sold over five days. Our task is to calculate the Standard Deviation of this Dataset. We’ll do it in 3 different methods and also discuss which one is more acurate and efficient.
But, before we directly jump into the solution, let’s breifly understand what actually this Standard Deviation is?
Standard Deviation
Standard Deviation is a measure of the Amount of Variation or Dispersion in a Set of Values. If I explain it in simple terms, it tells us that in our dataset, how much the values Deviate from the Mean/Average value.
A low Standard Deviation means the values are close to the Mean. While a high Standard Deviation shows that the values are spread out over a wider range.
Great! If you want, you can also watch the Video Solution: 👇
Alright! Let’s get into it and start writing our code now.
Solution
The very first thing here we can do is to store the sales figures in a list.
sales = [20, 35, 30, 25, 40]
1. Statistics
Let’s first import our statistics module.
# Method 1: Statistics
import statistics as st
We have a stdev() method in Statistics to calculate the Standard Deviation.
stdVal = st.stdev(sales)
print(stdVal) # Output: 7.905694150420948
And here we go! We got our Standard Deviation value which is approx. 7.906.
2. Pandas
Pandas is a popular Library for Data Manipulation and Analysis. Here, we’ll first convert our dataset as a Pandas Series.
# Method 2: Pandas
import pandas as pd
salesSeries = pd.Series(sales)
Now, we can use the std() method to calculate the Standard Deviation.
stdVal = salesSeries.std()
print(stdVal) # Output: 7.905694150420948
And here again, we got the Standard Deviation value of approx. 7.906. We’re getting accurate results so far.
3. NumPy
NumPy is a popular Python Library which we use for Numerical Computation. Let’s import it first.
# Method 3: NumPy
import numpy as np
In NumPy, we have a std() method, that we can use to calculate the Standard Deviation.
stdVal = np.std(sales)
print(stdVal) # Output: 7.0710678118654755
Ohh! Here, we are not getting the value we were getting earlier. It’s approx. 7.071. But in our previous approaches, we got a value of approx. 7.906. What went wrong then?
Well, this is because, when we try to calculate the Standard Deviation without Bessel’s Correction, it calculates for the entire population [it divides by n, and not by (n-1)].
This is just becasue of the different approaches we used to calculate the Standard Deviation. That’s the reason we get a slighly different calculation here. If we want to calculate it for the sample data, we can use Degrees of Freedom here so that it ensures to apply Bessel’s Correction.
stdVal = np.std(sales, ddof=1)
print(stdVal) # Output: 7.905694150420948
When Bessel’s Correction is applied, it now divides by (n-1), that’s why we are now getting the same result, we were getting earlier. We got a value of approx. 7.906 now.
I hope there is no more confusion on the different Standard Deviation. As I have already said it’s becasue the different approaches.
And that’s all for today’s article. I hope this article was helpful. If you get any doubt, you can ask me.
Thanks for reading! 😊