Sunday, May 25, 2014

TABLES AND GRAPHS FOR NUMERICAL DATA

A first step in any study is to define a problem. Later, data is gathered and the need for analysis will be apparent. Appropriate statistical tools to convert data to information is the subject of this chapter.

Frequency Distribution

A frequency distribution for numerical data is a table that summarizes data by listing the classes in the left column and the number of observations in each class in the right column. However, the classes or intervals for a frequency distribution of numerical data are not as easy identifiable.

To determine the intervals of a frequency distribution for numerical data requires answers to certain questions: How many intervals should be used? How wide should each interval be? Where does the first interval begin? Where does the last interval end? There are some general rules for preparing frequency distributions that make it easier for us to answer these types of questions, to summarize data, and to communicate results.

CONSTRUCTION OF A FREQUENCY DISTRIBUTION

Rule 1: Determine k, the number of intervals (classes). k may be determined by the Sturges’s rule, where n is the total number of observations.

The number of intervals (classes) used in a frequency distribution may be decided in a somewhat arbitrary manner.

Quick Guide to Approximate Number of Intervals for a Frequency Distribution. 

Practice and experience provide the best guidelines. Larger data sets may require more class intervals. If you select too few classes, the pattern and various characteristics of the data may be hidden. If you select too many classes, you will discover that some of your intervals may contain no observations or a very small number of frequencies.

Rule 2: Intervals should be the same width, w; the width is determined by the following:
            

Both k and w should be rounded upward, possibly to the next largest integer. The interval width is often rounded to a convenient whole number to provide for easy interpretation.

Rule 3: Intervals (classes) must be inclusive and non-overlapping. Each observation must belong to one and to only one interval. Consider a frequency distribution for the ages of a particular group of people. If the frequency distribution contains the intervals “age 20 to age 30” and “age 30 to age 40” to which of these two classes would a person age 30 belong?

The boundaries, or endpoints, of each class must be defined. To avoid overlapping, age intervals could be defined as “age 20 but less than age 30,” followed by “age 30 but less than age 40” and so on. Another possibility is to define the age intervals as “20-29”, “30-39” and so forth. Boundary selection is subjective. Simply be sure to define interval boundaries that promote a clear understanding and interpretation of the data. 

Two special frequency distributions are the cumulative frequency distribution and the relative cumulative frequency distribution.

No comments:

Post a Comment