Calculate the Range and IQR of Exam Scores

NIBEDITA (NS)
4 min readSep 26, 2024

--

Hey, guys! Welcome to the 6th Article (5th Problem-Solving) in our Data Science and Analytics Series. In today’s article, we’ll solve the Range & IQR Problem using Python.

So, let’s see our Problem-Statement now.

Problem 3: Range and IQR

You have been provided with the following dataset representing the exam scores:

examScores = [65, 75, 85, 95, 50, 70, 80, 90, 60, 100, 40, 30, 20, 10]

Your tasks are to calculate:

  1. The range of the dataset.
  2. The Interquartile Range (IQR) of the dataset.
Finding Range and IQR of Exam Scores
How to find Range and IQR of the Exam Scores

Explanation

Okay, here we’re given the dataset representing the exam scores. Our tasks are to calculate the range and IQR of the dataset. We’ll do it in three different methods inlcuding:

  1. NumPy
  2. Pandas
  3. Statistics

But, what are the Range and IQR of a Dataset called?

1. Range

The range of a dataset is the difference between maximun and minimun values. It gives us an idea of the spread of the data. The range can be very sensitive to outliers, because it only considers the extreme values.

Outliers basically are the extremely high or low values in the dataset.

2. IQR

The IQR is measured by the spread of the middle 50% of the data, which gives it a better idea of the variability of the dataset without being affected by outliers. The IQR is less sensitive to outliers, because it focuses on the central portion of the data.

IQR is calculated as the difference between the 75th percentile (Q3) and the 25th percentile (Q1).

Range Vs IQR

The Range looks at the extreme ends of the data, while the IQR looks at the central part of the data and ignores outliers.

Great! If you want, you can also watch the Video Solution: 👇

How to Calculate the Range and IQR of a Dataset?

Solution

The very first thing here we can do is to store the dataset in a list.

examScores = [65, 75, 85, 95, 50, 70, 80, 90, 60, 100, 40, 30, 20, 10]

1. NumPy

Let’s first import NumPy.

import numpy as np

We know the Range is the difference between the maximun and minimun values.

rangeVal = np.max(examScores) - np.min(examScores)
print(rangeVal) # Output: 90

So, the Range value is 90. Great! To find the IQR, we first need to find the Q1 and Q3. We have a built-in percentile() method in NumPy which we can use to find these.

q1 = np.percentile(examScores, 25)
q3 = np.percentile(examScores, 75)

Now, we’ll simply find the differece.

iqrVal = q3 - q1
print(iqrVal) # Output: 41.25

And here we got our IQR value, which is 41.25.

2. Pandas

Here, we’ll first import Pandas and create a Pandas Series.

import pandas as pd

scoreSeries = pd.Series(examScores)

Alright! In Pandas, we have a quantile() method that we can use to find Q1 and Q3.

q1 = scoreSeries.quantile(0.25)
q3 = scoreSeries.quantile(0.75)

We know IQR is the difference of Q3 and Q1.

iqrVal = q3 - q1
print(iqrVal) # Output: 41.25

There we go! We are getting accurate results so far.

Here, I didn’t solve for Range, as we’re going to do it same way we did in NumPy. We have max() and min() methods in Pandas as well, like NumPy.

We can also solve it using Pandas DataFrame with the same logic. You can try that later as well.

Well, to find the Range, we can also use Python’s built in max() and min() methods.

rangeVal = max(examScores) - min(examScores)
print(rangeVal) # Output: 90

3. Statistics

Alright! Let’s first sort our dataset and store the length of it.

sortedScores = sorted(examScores)
n = len(sortedScores)

Great! Now, let’s split the dataset into halves.

firstHalf = sortedScores[:n//2]
secondHalf = sortedScores[n//2:]

Awesome! Now, it’s time to import our Statistics Module, which we’ll use to find the medians of the halves.

import statistics as st

Okay, we can now easily calculate Q1 and Q3.

q1 = st.median(firstHalf)
q3 = st.median(secondHalf)

Now that we already have Q1 and Q3, we’ll simply find their difference.

iqrVal = q3 -q1
print(iqrVal) # Output: 45

Ohh! We’re not getting the same IQR value we were getting earlier. What went wrong then?

Well, this is because of the different methods and approaches we’re using to calculates the IQR value. NumPy uses Linear Interpolation, while the last method we used, we’re find the medians by splitting the dataset in halves. So, this calculation may slightly differs from how NumPy calculates it.

No need to worry about it, as both methods are valid. You can choose any of of them and stick with it.

And that’s all for today’s article. I hope this article was helpful. If you get any doubt, you can ask me.

Thanks for reading! 😊

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

--

--

NIBEDITA (NS)
NIBEDITA (NS)

Written by NIBEDITA (NS)

Tech enthusiast, Content Writer and lifelong learner!

No responses yet

Write a response