11 2.3 Measures of the Location of the Data
The common measures of location are quartiles and percentiles.
Quartiles are special percentiles.
- The first quartile, Q1, is the same as the 25th percentile.
25% of data will be less than 25th percentile; 75% of data will be more than 25th percentile. - The second quartile, Q2, is the same as the 50th percentile / median.
50% of data will be less than 50th percentile; 50% of data will be more than 50th percentile. - The third quartile, Q3, is the same as the 75th percentile.
75% of data will be less than 75th percentile; 25% of data will be more than 75th percentile.
The general form is :
n % of data will be less than nth percentile and (100% – n%) of data will be more than nth percentile.
The following video gives an introduction to Median, Quartiles and Interquartile Range, the topic you will learn in this section.
To calculate quartiles and percentiles, the data must be ordered from smallest to largest. Quartiles divide ordered data into quarters. Percentiles divide ordered data into hundredths. To score in the 90th percentile of an exam does not mean, necessarily, that you received 90% on a test. It means that 90% of test scores are the same or less than your score and 10% of the test scores are the same or greater than your test score.
Percentiles are useful for comparing values. For this reason, universities and colleges use percentiles extensively. One instance in which colleges and universities use percentiles is when SAT results are used to determine a minimum testing score that will be used as an acceptance factor. For example, suppose Duke accepts SAT scores at or above the 75th percentile. That translates into a score of at least 1220. To be admitted as Duke student, your SAT score has to be at least 1220.
Percentiles are mostly used with very large populations. Therefore, if you were to say that 90% of the test scores are less (and not the same or less) than your score, it would be acceptable because removing one particular data value is not significant.
The median is a number that measures the “center” of the data. You can think of the median as the “middle value,” but it does not actually have to be one of the observed values. It is a number that separates ordered data into halves. Half the values are the same number or smaller than the median, and half the values are the same number or larger.
For example, consider the following data.
1; 11.5; 6; 7.2; 4; 8; 9; 10; 6.8; 8.3; 2; 2; 10; 1
Ordered from smallest to largest:
1; 1; 2; 2; 4; 6; 6.8; 7.2; 8; 8.3; 9; 10; 10; 11.5
Since there are 14 observations, the median is between the seventh value, 6.8, and the eighth value, 7.2. To find the median, add the two values together and divide by two.
[latex]\displaystyle\frac{{{6.8}+{7.2}}}{{2}}={7}[/latex]
The median is seven.
50% of the values are smaller than 7 and 50% of the values are larger than 7.
Quartiles are numbers that separate the data into quarters. Quartiles may or may not be part of the data.
To find the quartiles,
|
To get the idea, consider the same data set:
1; 1; 2; 2; 4; 6; 6.8; 7.2; 8; 8.3; 9; 10; 10; 11.5
- The median or second quartile fall between the 7th and 8th data. The median is the mean of 6.8 and 7.2.
Hence, the median = [latex]\frac{6.8 + 7.2}{2}[/latex] = 7
- The lower half of the data are 1, 1, 2, 2, 4, 6, 6.8. The middle value of the lower half is 2.
1; 1; 2; 2; 4; 6; 6.8
The number 2, which is part of the data, is the first quartile.
25% of the entire sets of values are the same as or less than 2. 75% of the values are more than 2. - The upper half of the data is 7.2, 8, 8.3, 9, 10, 10, 11.5. The middle value of the upper half is 9.
The third quartile, Q3, is 9.
75% of the ordered data set are less than 9. 25% of the ordered data set are greater than 9.
The interquartile range is a number that indicates the spread of the middle half or the middle 50% of the data. It is the difference between the third quartile (Q3) and the first quartile (Q1).
IQR = Q3 – Q1
The IQR can help determine outliers.
| A data is a potential outlier if and only if the data is [latex]\begin{cases}\text{smaller than Q1 - 1.5 * IQR}\\\text{or}\\\text{larger than Q3 + 1.5 * IQR}\end{cases}[/latex] |
Example 1
For the following 11 salaries, calculate the IQR and determine if any salaries are outliers. The salaries are in dollars.
$33,000 $64,500 $28,000 $54,000 $72,000 $68,500 $69,000 $42,000 $54,000 $120,000 $40,500
- Find the median, Q1, and Q3.
Show Answer
Order the data from smallest to largest.$28,000 $33,000 $40,500 $42,000 $54,000 $54,000 $64,500 $68,500 $69,000 $72,000 $120,000Median = $54,000Q1 = $40,500Q3 = $69,000
- Find the interquartile range, IQR.
Show Answer
IQR = $69,000 – $40,500 = $28,500
- Find the potential outlier.
Show Answer
(1.5)( IQR) = (1.5)($28,500) = $42,750Q1 – (1.5)(IQR) = $40,500 – $42,750 = –$2,250Q3 + (1.5)(IQR) = $69,000 + $42,750 = $111,750No salary is less than $2,250.
However, $120,000 is more than $11,750, so $120,000 is a potential outlier.
Try It
Test Scores for Class A
69; 96; 81; 79; 34; 76; 83; 99; 89; 67; 90; 77; 85; 98; 66; 91; 77; 69; 80; 94
Test Scores for Class B
90; 72; 80; 92; 90; 97; 92; 75; 79; 39; 70; 80; 129; 95; 78; 73; 71; 68; 95; 134
1. Find the interquartile range (IQR) for the following two data sets and compare them.
Show Answer
Class AOrder the data from smallest to largest. 34; 66; 67; 69; 69; 76; 77; 77; 79; 80; 81; 83; 85; 89; 90; 91; 94; 96; 98; 99 [latex]\displaystyle {Median}=\frac{{{80}+{81}}}{{2}}={80.5}[/latex] [latex]{Q}_{{1}}=\frac{{{69}+{76}}}{{2}}={72.5}[/latex] [latex]{Q}_{{3}}=\frac{{{90}+{91}}}{{2}}={90.5}[/latex] IQR = 90.5 – 72.5 = 18 |
Class BOrder the data from smallest to largest. 39; 68; 70; 71; 72; 73; 75; 78; 79; 80; 80; 90; 90; 92; 92; 95; 95; 97; 129; 134 [latex]\displaystyle{Median}=\frac{{{80}+{80}}}{{2}}={80}[/latex] [latex]{Q}_{{1}}=\frac{{{72}+{73}}}{{2}}={72.5}[/latex] [latex]{Q}_{{3}}=\frac{{{92}+{95}}}{{2}}={93.5}[/latex] IQR = 93.5 – 72.5 = 21 |
The data for Class B has a larger IQR, so the scores between Q3 and Q1 (middle 50%) for the data for Class B are more spread out and not clustered about the median.
2. Is there any outlier in class A?
Show Answer
Q1 – (1.5)(IQR) = 72.5 – (1.5)(18) = 45.5
Q3 + (1.5)(IQR) = 90.5 + (1.5)(18) = 117.5
In class A, we have these data:
34; 66; 67; 69; 69; 76; 77; 77; 79; 80; 81; 83; 85; 89; 90; 91; 94; 96; 98; 99.
34 is less than Q1 – (1.5)(IQR), 45.5.
No data is greater than Q3 + (1.5)(IQR) = 117.5.
So, 34 is the only potential outlier in class A.
3. Is there any outlier in class B?
Show Answer
Q1 – (1.5)(IQR) = 72.5 – (1.5)(21) = 41
Q3 + (1.5)(IQR) = 93.5 + (1.5)(21) = 125
In class B, we have these data:
39; 68; 70; 71; 72; 73; 75; 78; 79; 80; 80; 90; 90; 92; 92; 95; 95; 97; 129; 134.
39 is less than Q1 – (1.5)(IQR), 41.
129 and 134 are greater than Q3 + (1.5)(IQR), 125.
So 39, 129 and 134 are the potential outliers in class B.
Example 2
The following 13 real estate prices. (Prices are in dollars.)
Calculate the IQR and determine if any prices are potential outliers.
Show Answer
Order the data from smallest to largest.
114,950; 158,000; 230,500; 387,000; 389,950; 479,000; 488,800; 529,000; 575,000; 639,000; 659,000; 1,095,000; 5,500,000
Median = 488,800
Q1 = [latex]\frac{230,500+387,000}{2}[/latex] = 308,750
Q3 =[latex]\frac{639,000+659,000}{2}[/latex] = 649,000
IQR = 649,000 – 308,750 = 340,250
(1.5)(IQR) = (1.5)(340,250) = 510,375
Q1 – (1.5)(IQR) = 308,750 – 510,375 = –201,625
Q3 + (1.5)(IQR) = 649,000 + 510,375 = 1,159,375
No house price is less than –201,625. However, 5,500,000 is more than 1,159,375. Therefore, 5,500,000 is a potential outlier.
A Formula for Finding the kth Percentile
If you were to do a little research, you would find several formulas for calculating the kth percentile. Here is one of them.
k = the kth percentile. It may or may not be part of the data.
i = the index (ranking or position of a data value)
n = the total number of data
- Order the data from smallest to largest.
- Calculate [latex]\displaystyle{i}=\frac{{k}}{{100}}{({n}+{1})}[/latex]
- If i is an integer, then the kth percentile is the data value in the ith position in the ordered set of data.
- If i is not an integer, then round i up and round i down to the nearest integers. Average the two data values in these two positions in the ordered data set. This is easier to understand in an example.
Example 3
Fifty statistics students were asked how much sleep they get per school night (rounded to the nearest hour). The results were:
| Amount of Sleep per School Night (Hours) | Frequency | Relative Frequency | Cumulative Relative Frequency |
|---|---|---|---|
| 4 | 2 | 0.04 | 0.04 |
| 5 | 5 | 0.10 | 0.14 |
| 6 | 7 | 0.14 | 0.28 |
| 7 | 12 | 0.24 | 0.52 |
| 8 | 14 | 0.28 | 0.80 |
| 9 | 7 | 0.14 | 0.94 |
| 10 | 3 | 0.06 | 1.00 |
(a) Find the 28th percentile.
Show Answer
Notice the 0.28 in the “cumulative relative frequency” column. Twenty-eight percent of 50 data values is 14 values. There are 14 values less than the 28th percentile. They include the two 4s, the five 5s, and the seven 6s. The 28th percentile is between the last six and the first seven. The 28th percentile is 6.5.
(b) Find median.
Show Answer
Look again at the “cumulative relative frequency” column and find 0.52. The median is the 50th percentile or the second quartile. 50% of 50 is 25. There are 25 values less than the median. They include the two 4s, the five 5s, the seven 6s, and eleven of the 7s. The median or 50th percentile is between the 25th, or seven, and 26th, or seven, values. The median is seven.
(c) Find the third quartile.
Show Answer
The third quartile is the same as the 75th percentile. You can “eyeball” this answer. If you look at the “cumulative relative frequency” column, you find 0.52 and 0.80. When you have all the fours, fives, sixes and sevens, you have 52% of the data. When you include all the 8s, you have 80% of the data. The 75th percentile, then, must be an eight. Another way to look at the problem is to find 75% of 50, which is 37.5, and round up to 38. The third quartile, Q3, is the 38th value, which is an eight. You can check this answer by counting the values. (There are 37 values below the third quartile and 12 values above.
Try It
Forty bus drivers were asked how many hours they spend each day running their routes (rounded to the nearest hour).
| Amount of time spent on route (hours) | Frequency | Relative Frequency | Cumulative Relative Frequency |
|---|---|---|---|
| 2 | 12 | 0.30 | 0.30 |
| 3 | 14 | 0.35 | 0.65 |
| 4 | 10 | 0.25 | 0.90 |
| 5 | 4 | 0.10 | 1.00 |
Find the 65th percentile.
[practice-area rows=”2″][/practice-area]
Show Answer
The 65th percentile is 3.5.The 65th percentile is between the last three and the first four.
Example 4
| Amount of Sleep per School Night (Hours) | Frequency | Relative Frequency | Cumulative Relative Frequency |
|---|---|---|---|
| 4 | 2 | 0.04 | 0.04 |
| 5 | 5 | 0.10 | 0.14 |
| 6 | 7 | 0.14 | 0.28 |
| 7 | 12 | 0.24 | 0.52 |
| 8 | 14 | 0.28 | 0.80 |
| 9 | 7 | 0.14 | 0.94 |
| 10 | 3 | 0.06 | 1.00 |
- Find the 80th percentile.
Show Answer
The 80th percentile is between the last eight and the first nine in the table (between the 40th and 41st values). Therefore, we need to take the mean of the 40th an 41st values. The 80th percentile = [latex]\displaystyle\frac{{{8}+{9}}}{{2}}={8.5}[/latex]
- Find the 90th percentile.
Show Answer
The 90th percentile will be the 45th data value (location is 0.90(50) = 45) and the 45th data value is nine.
- Find the first quartile. What is another name for the first quartile?
Show Answer
Q1 is also the 25th percentile. The 25th percentile location calculation:
25th percentile = 0.25(50) = 12.5 ≈ 13 the 13th data value.
Thus, the 25th percentile is six.
Try It
| Amount of time spent on route (hours) | Frequency | Relative Frequency | Cumulative Relative Frequency |
|---|---|---|---|
| 2 | 12 | 0.30 | 0.30 |
| 3 | 14 | 0.35 | 0.65 |
| 4 | 10 | 0.25 | 0.90 |
| 5 | 4 | 0.10 | 1.00 |
Find the third quartile. What is another name for the third quartile?
Show Answer
[practice-area rows=”2″][/practice-area]
Show Answer
The third quartile is the 75th percentile, which is four. The 65th percentile is between three and four, and the 90th percentile is between four and 5.75. The third quartile is between 65 and 90, so it must be four.
Example 5
Listed are 29 ages for Academy Award winning best actors in order from smallest to largest.
18; 21; 22; 25; 26; 27; 29; 30; 31; 33; 36; 37; 41; 42; 47; 52; 55; 57; 58; 62; 64; 67; 69; 71; 72; 73; 74; 76; 77
- Find the 70th percentile.
Show Answer
k = 70th percentile, i = the index, n = 29
[latex]\displaystyle{i}=\frac{{k}}{{100}}{({n}+{1})}={(\frac{{70}}{{100}})}{({29}+{1})}={21}[/latex].
Twenty-one is an integer, and the data value in the 21st position in the ordered data set is 64.
The 70th percentile is 64 years. - Find the 83rd percentile.
Show Answer
k = 83rd percentile, i = the index, n = 29
[latex]\displaystyle{i}=\frac{{k}}{{100}}{({n}+{1})}={(\frac{{83}}{{100}})}{({29}+{1})}={24.9}[/latex], which is NOT an integer. Round it down to 24 and up to 25.
The age in the 24th position is 71 and the age in the 25th position is 72.
We will find the average of 71 and 72.
The 83rd percentile is 71.5 years.
Try It
Listed are 29 ages for Academy Award winning best actors in order from smallest to largest.
18; 21; 22; 25; 26; 27; 29; 30; 31; 33; 36; 37; 41; 42; 47; 52; 55; 57; 58; 62; 64; 67; 69; 71; 72; 73; 74; 76; 77
Calculate
- the 20th percentile
Show Answer
k = 20. Index = [latex]\displaystyle{i}=\frac{{k}}{{100}}{({n}+{1})}=\frac{{20}}{{100}}{({29}+{1})}={6}[/latex] The age in the sixth position is 27. The 20th percentile is 27 years.
- the 55th percentile.
Show Answer
k = 55. Index, i = [latex]\frac{{k}}{{100}}{({n}+{1})}=\frac{{55}}{{100}}{({29}+{1})}={16.5}[/latex]. Round down to 16 and up to 17. The age in the 16th position is 52 and the age in the 17th position is 55. The average of 52 and 55 is 53.5. The 55th percentile is 53.5 years.
A Formula for Finding the Percentile of a Value in a Data Set
- Order the data from smallest to largest.
- x = the number of data values counting from the bottom of the data list up to but not including the data value for which you want to find the percentile.
- y = the number of data values equal to the data value for which you want to find the percentile.
- n = the total number of data.
- Calculate [latex]\displaystyle\frac{{{x}+{0.5}{y}}}{{n}}{({100})}[/latex]. Then round to the nearest integer.
Example 6
Listed are 29 ages for Academy Award winning best actors in order from smallest to largest.
18; 21; 22; 25; 26; 27; 29; 30; 31; 33; 36; 37; 41; 42; 47; 52; 55; 57; 58; 62; 64; 67; 69; 71; 72; 73; 74; 76; 77
- Find the percentile for 58.
- Find the percentile for 25.
Solution:
- Counting from the bottom of the list, there are 18 data values less than 58. There is one value of 58.
Number of data values counting from the bottom of the data list up to but not including the data value 58, x = 18
Number of data values equal to the data value 58, y = 1
[latex]\frac{{{x}+{0.5}{y}}}{{n}}{({100})}=\frac{{{18}+{0.5}{({1})}}}{{29}}{({100})}={63.80}}.}[/latex]58 is the 64th percentile. - Counting from the bottom of the list, there are three data values less than 25. There is one value of 25.
Number of data values counting from the bottom of the data list up to but not including the data value 25, x = 18
Number of data values equal to the data value 25, y = 1
[latex]\frac{{{x}+{0.5}{y}}}{{n}}{({100})}=\frac{{{3}+{0.5}{({1})}}}{{29}}{({100})}={12.07}}.}[/latex]25 is the 12th percentile.
Try It
Listed are 30 ages for Academy Award winning best actors
in order from smallest to largest.
18; 21; 22; 25; 26; 27; 29; 30; 31, 31; 33; 36; 37; 41; 42; 47; 52; 55; 57; 58; 62; 64; 67; 69; 71; 72; 73; 74; 76; 77
Find the percentiles for 47.
Show Answer
Percentile for 47: Counting from the bottom of the list, there are 15 data values less than 47. There is one value of 47.
[latex]\displaystyle{.matrix{{x}={15}{\quad\text{and}\quad}{y}={1}\frac{{{x}+{0.5}{y}}}{{n}}{({100})}=\frac{{{15}+{0.5}{({1})}}}{{29}}{({100})}={53.45}}.}[/latex]
47 is the 53rd percentile.
Find the percentiles for 31.
Show Answer
Percentile for 31: Counting from the bottom of the list, there are eight data values less than 31. There are two values of 31.
[latex]\displaystyle{.matrix{{x}={15}{\quad\text{and}\quad}{y}={2}\frac{{{x}+{0.5}{y}}}{{n}}{({100})}=\frac{{{15}+{0.5}{({2})}}}{{29}}{({100})}={31.03}}.}[/latex]
31 is the 31st percentile.
Interpreting Percentiles, Quartiles, and Median
A percentile indicates the relative standing of a data value when data are sorted into numerical order from smallest to largest. Percentages of data values are less than or equal to the pth percentile. For example, 15% of data values are less than or equal to the 15th percentile.
- Low percentiles always correspond to lower data values.
- High percentiles always correspond to higher data values.
A percentile may or may not correspond to a value judgment about whether it is “good” or “bad.” The interpretation of whether a certain percentile is “good” or “bad” depends on the context of the situation to which the data applies. In some situations, a low percentile would be considered “good;” in other contexts a high percentile might be considered “good”. In many situations, there is no value judgment that applies.
Understanding how to interpret percentiles properly is important not only when describing data, but also when calculating probabilities in later chapters of this text.
GuidelineWhen writing the interpretation of a percentile in the context of the given data, the sentence should contain the following information.
|
Example 7
On a timed math test, the first quartile for time it took to finish the exam was 35 minutes.
Interpret the first quartile in the context of this situation.
Solution:
- 25% of students finished the exam in 35 minutes or less.
- 75% of students finished the exam in 35 minutes or more.
- A low percentile could be considered good, as finishing more quickly on a timed exam is desirable.
(If you take too long, you might not be able to finish.)
Try It
For the 100-meter dash, the third quartile for times for finishing the race was 11.5 seconds.
Interpret the third quartile in the context of the situation.
Show Answer
25% of runners finished the race in 11.5 seconds or more. 75% of runners finished the race in 11.5 seconds or less. A lower percentile is good because finishing a race more quickly is desirable.
Example 8
On a 20 question math test, the 70th percentile for number of correct answers was 16.
Interpret the 70th percentile in the context of this situation.
Solution:
- 70% of students answered 16 or fewer questions correctly.
- 30% of students answered 16 or more questions correctly.
- A higher percentile could be considered good, as answering more questions correctly is desirable.
Try It
On a 60 point written assignment, the 80th percentile for the number of points earned was 49.
Interpret the 80th percentile in the context of this situation.
Show Answer
80% of students earned 49 points or fewer. 20% of students earned 49 or more points. A higher percentile is good because getting more points on an assignment is desirable.
Example 9
At a community college, it was found that the 30th percentile of credit units that students are enrolled for is 7 units.
Interpret the 30th percentile in the context of this situation.
Solution:
- 30% of students are enrolled in 7 or fewer credit units.
- 70% of students are enrolled in 7 or more credit units.
- In this example, there is no “good” or “bad” value judgment associated with a higher or lower percentile. Students attend community college for varied reasons and needs, and their course load varies according to their needs.
Try It
During a season, the 40th percentile for points scored per player in a game is eight. Interpret the 40th percentile in the context of this situation.
Show Answer
40% of players scored eight points or fewer. 60% of players scored eight points or more.
A higher percentile is good because getting more points in a basketball game is desirable.
Example 10
Sharpe Middle School is applying for a grant that will be used to add fitness equipment to the gym. The principal surveyed 15 anonymous students to determine how many minutes a day the students spend exercising. The results from the 15 anonymous students are shown.
0 minutes; 40 minutes; 60 minutes; 30 minutes; 60 minutes
10 minutes; 45 minutes; 30 minutes; 300 minutes; 90 minutes;
30 minutes; 120 minutes; 60 minutes; 0 minutes; 20 minutes
Determine the following five values.
- Min = 0
- Q1 = 20
- Med = 40
- Q3 = 60
- Max = 300
Solution:
If you were the principal, would you be justified in purchasing new fitness equipment? Since 75% of the students exercise for 60 minutes or less daily, and since the
IQR is 40 minutes (60 – 20 = 40), we know that half of the students surveyed exercise between 20 minutes and 60 minutes daily. This seems a reasonable amount of time spent exercising, so the principal would be justified in purchasing the new equipment.
However, the principal needs to be careful. The value 300 appears to be a potential outlier.
Q3 + 1.5(IQR) = 60 + (1.5)(40) = 120.
The value 300 is greater than 120 so it is a potential outlier. If we delete it and calculate the five values, we get the following values:
- Min = 0
- Q1 = 20
- Q3 = 60
- Max = 120
We still have 75% of the students exercising for 60 minutes or less daily and half of the students exercising between 20 and 60 minutes a day. However, 15 students is a small sample and the principal should survey more students to be sure of his survey results.
Concept Review
The values that divide a rank-ordered set of data into 100 equal parts are called percentiles. Percentiles are used to compare and interpret data. For example, an observation at the 50th percentile would be greater than 50 percent of the other obeservations in the set. Quartiles divide data into quarters. The first quartile (Q1) is the 25th percentile,the second quartile (Q2 or median) is 50th percentile, and the third quartile (Q3) is the the 75th percentile. The interquartile range, or IQR, is the range of the middle 50 percent of the data values. The IQR is found by subtracting Q1 from Q3, and can help determine outliers by using the following two expressions.
- Q3 + IQR(1.5)
- Q1 – IQR(1.5)
Formula Review
[latex]\displaystyle{i}={(\frac{{k}}{{100}})}{({n}+{1})}[/latex]where
i = the ranking or position of a data value,
k = the kth percentile,
n = total number of data.
Expression for finding the percentile of a data value:
[latex]\displaystyle{(\frac{{{x}+{0.5}{y}}}{{n}})}{({100})}[/latex]
where
x = the number of values counting from the bottom of the data list up to but not including the data value for which you want to find the percentile,
y = the number of data values equal to the data value for which you want to find the percentile,
n = total number of data
References
Cauchon, Dennis, Paul Overberg. “Census data shows minorities now a majority of U.S. births.” USA Today, 2012. Available online at http://usatoday30.usatoday.com/news/nation/story/2012-05-17/minority-birthscensus/55029100/1 (accessed April 3, 2013).
Data from the United States Department of Commerce: United States Census Bureau. Available online at http://www.census.gov/ (accessed April 3, 2013).
“1990 Census.” United States Department of Commerce: United States Census Bureau. Available online at http://www.census.gov/main/www/cen1990.html (accessed April 3, 2013).
Data from
San Jose Mercury News.
Data from
Time Magazine; survey by Yankelovich Partners, Inc.