Friday, May 30, 2014

Measures of Variability

The mean alone does not provide a complete or sufficient description of data. In this section we present descriptive numbers that measures the variability or spread of the observations from the mean. In particular we include
(i)    Range
(ii)    Interquartile range
(iii)    Variance
(iv)    Standard deviation and
(v)    Coefficient of variation

No two things are exactly alike. This is one of the basic principles of statistical quality control. Variation exists in all areas. The weather varies greatly from day to day, and even from hour to hour; grades on a test differ for students taking the same course with the same instructor, a person’s blood pressure, pulse, cholesterol level, and caloric intake will vary daily.

While two data sets could have the same mean, the individual observations in one set could vary more from the mean than do the observations in the second set.  Consider the following two sets of sample data:


Sample A
1
2
1
36
Sample B
8
9
10
13

Although the mean is 10 for both samples, clearly, the data in sample A are further from 10 than are then data in sample B. We need descriptive numbers to measure this spread.

Range

Range is the difference between the largest and smallest observations. The greater the spread of the data from the center of the distribution, the larger the range will be. Since the range takes into account only the largest and smallest observations, it is susceptible to considerable distortion if there is an unusual extreme observation. Although the range measures the total spread of the data, the range may be an unsatisfactory measure of variability (spread) because outliers either very high or very low observations, influence it. One way to avoid this difficulty is to arrange the data in ascending or descending order, discard a few of the highest and few of the lowest numbers, and find the range of those remaining.

Interquartile Range

 The interquartile range (IQR) measures the spread in the middle 50% of the data; it is the difference between the observation at Q3, the third quartile (or 75th percentile), and the observation at Q1, the first quartile (or 25th percentile). Thus

                        IQR = Q3 – Q1

where Q3 is located in the 0.75(n + 1)th position when the data are in increasing order and Q1 is located in the 0.25(n + 1)th position when the data are in increasing order.

Five-Number Summary
The five-number summary refers to the five descriptive measures: minimum, first quartile, median, third quartile, and maximum. Clearly,

                            Minimum < Q1 < Median < Q3 < Maximum

Example: Waiting Times at Gilotti’s Grocery
Gilotti’s Grocery advertises that customers wait less than minutes to pay if they go through the Speedy Transaction Aisles. Figure 1 is a stem-and-leaf display for a sample of 25 waiting times (in seconds). Compute the five-number summary.
 


Figure 1: Waiting times at Gilotti’s Grocery


Frequency
Stem
Leaf
9
1
1
2
4
6
7
8
8
9
9
9
2
1
2
2
2
4
6
8
9
9
7
3
0
1
2
3
4




2
4
0
2









Solution: From the stem-and-leaf display we see that the minimum time is 11 seconds and the maximum time is 42 seconds. The quartile, Q1, is located in the 0.25(25 + 1)th ordered position = 6.5th ordered position. The value is 18 seconds. The third quartile, Q3, is located in the 0.75(25 + 1)th ordered position = 19.5th ordered position. The value is 30.5 seconds. The median time is 0.5(25 + 1)th ordered observation = 13th ordered position observation. The value is 22 seconds. The range is calculated as 42 – 11 = 31 seconds; interquartile range = 30.5 – 18 = 12.5 seconds; that is, the middle 50% of the data have a spread of only 12.5 seconds.

No comments:

Post a Comment