# What is a Frequency Distribution?

A frequency distribution is a simplified way to classify raw data of a quantitative variable. It shows how the different values of a variable (here, the marks in statistics scored by a student) are distributed in different classes along with their corresponding class frequencies. In this case we have ten classes of marks: 0-5, 5-10... 45- 50. The term*Class Frequency*means the number of values in a particular class. For example, in the class 15- 20 we find 7 values of marks from raw data in Table 3.1. They are 32, 27, 14, 34, 25, 19, 16. The frequency of the class: 15-20 is thus 7. But there is an interesting thing that why 20-which is occurring

*twice*in the raw data - is not included in the class 15-20. Had it been included the class frequency of 15-20 would have been 9 instead of 7. In each class of a frequency distribution table there are

*Class*

*Limits.*Class limits are the two ends of a class. The lowest value is called the

*Lower Class Limit*and the highest value the

*Upper Class Limit.*For example, the class limits for the class: 30-35 are 30 and 35. Its lower class limit is 30 and its upper class limit is 35.

*Class Interval or Class Width*is the difference between the upper class limit and the lower class limit. For the class 30-35, the class interval is 5 (upper class limit

*minus*lower class limit). The

*Class Mid-Point*or

*Class Mark*is the middle value of a class. It lies halfway between the lower class limit and the upper class limit of a class and can be calculated in the following manner:

*Class Mid-Point or Class Mark = (Upper Class Limit + Lower Class Limit) / 2 ................................. (1)*

The class mark or mid-value of each class is used to represent the class. Once raw data are grouped into classes, individual observations are not used in further calculations. Instead, the class mark is used.

**TABLE 3.3**

**Sub Topic: The Lower Class Limits, the Upper Class Limits and the Class Mark**

Class |
Frequency |
Lower class limit |
Upper class limit |
Mid-value (or) Class mark |

0 - 5 5-10 10-15 15-20 20-25 25-30 30-35 35-40 40-45 45-50 |
1 8 6 7 21 23 19 6 5 4 |
0 5 10 15 20 25 30 35 40 45 |
5 10 15 20 25 30 35 40 45 50 |
2.5 7.5 12.5 17.5 22.5 27.5 32.5 37.5 42.5 47.5 |

Frequency Curve is a graphic representation of a frequency distribution. Fig. 3.1 shows the diagrammatic presentation of the frequency distribution of the data in our example above. To obtain the frequency curve we plot the class marks on the X-axis and frequency on the y-axis.

**Fig. 3.1:**

*Diagrammatic Presentation of Frequency Distribution of Data.*

**Courtesy **NCERT Text book

# How to prepare a Frequency Distribution?

While preparing a frequency distribution from the raw data of Table 3.1, the following four questions need to be addressed:- How many classes should we have?
- What should be the size of each class?
- How should we determine the class limits?
- How should we get the frequency for each class?

# How many classes should we have?

For determine the number of classes, we first find out as to what extent the variable in hand changes in value. Such variations in variable's value are captured by its*range*The

**.***Range is the difference between the*

*largest and the smallest values of the variable.*A large range indicates that

*the values of the variable are widely*

*spread. On the other hand, a small*

*range indicates that the values of the variable are spread narrowly. In our*

*example the range of the variable “marks of a student” are 100 because*

*the minimum marks are 0 and the maximum marks 100. It indicates that*

*the variable has a large variation. Once we obtaining the value of range, then we decide the*

*class interval, and then it is easy to determine the number of classes. Note that*

*range is the sum of all class intervals.*If the class

*intervals are equal then range is the*

*product of the number of classes and*

*class interval of a single class.*

*Range = Number of Classes × Class Interval ........................................(2)*

For the value of range, the number of classes would be large if we choose small class intervals. A frequency distribution with too many classes would look too large. Such a distribution is not easy to handle. So we want to have a reasonably compact set of data. On the other hand, given the value of range if we choose a class interval that is too large then the number of classes becomes too small. The data set then may be too compact and we may not like the loss of information about its diversity. For example, suppose the range is 50 and the class interval is 25. Then the number of classes would be just 2 (i.e.50/25 = 2). Though there is no hard-and-fast rule to determine the number of classes, the rule of thumb often used is that the number of classes should be between 5 and 15. In our example we have chosen to have 10 classes. Since the range is 50 and the class interval is 10, the number of classes is 50/10 =5.

# What should be the size of each class?

The answer to this question varies based on the answer to the previous question. The equality (2) shows that given the range of the variable, we can determine the number of classes once we decide the class interval. Similarly, we can determine the class interval once we decide the number of classes.Thus, these two decisions are inter-linked with one another. We cannot decide on one without deciding on the other. In the table 3.4, we have the number of classes as 10. Given the value of range as 100, the class intervals are automatically 10 by the equality (2). One important note here is that the present context we have chosen class intervals that are equal in magnitude. However we could have chosen class intervals that are not of equal magnitude. In that case, the classes would have been of unequal width.

# How should we determine the class limits?

When you classify raw data of a continuous variable as a frequency distribution, you will first make classes by using the group of individual observations.*The value of the upper class limit of a*

*class is obtained by adding the class*

*interval with the value of the lower*

*class limit of that class.*

For example, the upper class limit of the class 10 - 20 is 10 + 10 = 20 where 10 is the lower class limit and 10 is the class interval. This method is repeated for other classes as well.

But how do we decide the lower class limit of the first class? That is to say, why 0 is the lower class limit of the first class: 0-10? It is because we chose the minimum value of the variable as the lower limit of the first class. In fact, we could have chosen a value less than the minimum value of the variable as the lower limit of the first class. Similarly, for the upper class limit for the last class we could have chosen a value greater than the maximum value of the variable. It is important to note that, when a frequency distribution is being constructed, the class limits should be so chosen that the mid-point or class mark of each class coincide, as far as possible, with any value around which the data tend to be concentrated. In our example on marks of 100 students, we chose 0 as the lower limit of the first class: 0-10 because the minimum marks were 0. And that is why; we could not have chosen 1 as the lower class limit of that class. If we would have excluded the observation 0, then we had taken 1 as the lower limit. The upper class limit of the first class: 0-10 is then obtained by adding class interval with lower class limit of the class. Thus the upper class limit of the first class becomes 0 + 10 = 10. And this procedure is followed for the other classes as well. Have you noticed that the upper class limit of the first class is equal to the lower class limit of the second class? And both are equal to 10. This is observed for other classes as well because the reason is that we have used the

*Exclusive Method*of classification of raw data. Under the method we form classes in such a way that the lower limit of a class coincides with the upper class limit of the previous class. Next we would face the problem that, how do we classify an observation that is not only equal to the upper class limit of a particular class but is also equal to the lower class limit of the next class.

For example, we find observation 20 to be equal to the upper class limit of the class 10-20 and it is equal to the lower class limit of class 20-30. Then, where shall we put the observation 20 out of two classes: 10-20 or 20-30. We can put it either in class 20-30 or in class 30- 40.

It is a dilemma that one commonly faces while classifying data in overlapping classes. In the

*Exclusive Method, t*his problem is solved by the rule of classification.

# Exclusive Method

By this method, the classes are formed in such a way that the upper class limit of one class equals the lower class limit of the next class. In this way the continuity of the data is maintained. That is why this method of classification is most suitable in case of data of a continuous variable. As per*this method, the upper class limit is excluded but the lower class limit of a class is included in the interval.*Thus

*an observation that is exactly equal*

*to the upper class limit, according to*

*the method, would not be included in*

*that class but would be included in*

*the next class.*

Then, if

*it were equal to the lower class limit*

*then it would be included in that class.*

*In our example on marks of students,*

*the observation 20, which occurs twice,*

*in the raw data of Table 3.1, is not*

*included in the class: 15-20. It is*

*included in the next class: 20-25. That*

*is why we find the frequency corresponding*

*to the class 15-20 to be 7*

*instead of 9.*

*There is another method of forming*

*classes and it is known as the*

*Inclusive Method of classification*

**.**

# Inclusive Method

If we compare with the*exclusive method,*the

*Inclusive Method*does not exclude

*the upper class limit in a class interval. It includes the upper class in a class. Thus both class limits are parts of the class interval. For example, in the frequency distribution of Table 3.4 we include*

**TABLE 3.4**

Frequency Distribution of Incomes of 550 Employees of a Company

Frequency Distribution of Incomes of 550 Employees of a Company

Income (Rs.) |
Number of employees |

700-799 800-899 900-999 1000-1099 1100-1199 1200-1299 |
150 80 120 130 50 20 |

Total |
550 |

In the class 700-799, those employees whose income is either Rs 700, or between Rs. 700 and Rs. 799, or Rs. 799. If the income of an employee is exactly Rs 800 then he is put in the next class: 800-899.

# Adjustment in Class Interval

A close observation of the

*Inclusive Method*in Table 3.4 would show that

*though the variable “income” is a*

*continuous variable, no such*

*continuity is maintained when the*

*classes are made. We find*

*“gap”*or

*discontinuity between the upper limit*

*of a class and the lower limit of the*

*next class.*

For example, between the

*upper limit of the first class: 899 and*

*the lower limit of the next class:*

*900, we find a “gap” of 1.*

Then how

*do we ensure the continuity of the*

*variable while classifying data? This*

*is done by making an adjustment*

*in the class interval. The adjustment*

*is done in the following way:*

- Find the difference between the lower limit of the second class and the upper limit of the first class. For example, in Table 3.4 the lower limit of the second class is 900 and the upper limit of the first class is 899. The difference between them is 1, i.e. (800 - 799 = 1)
- Divide the difference obtained in (1) by two i.e. (1/2 = 0.5)
- Subtract the value obtained in (2) from lower limits of all classes (lower class limit - 0.5)
- Add the value obtained in (2) to upper limits of all classes (upper class limit + 0.5). After the adjustment that restores continuity of data in the frequency distribution, the Table 3.4 is modified into Table 3.5 After the adjustments in class limits, the equality (1) that determines the value of class-mark would be modified as the following:
*Adjusted Class Mark = (Adjusted**Upper Class Limit + Adjusted Lower**Class Limit)/2.*

**TABLE 3.5**

Frequency Distribution of Incomes of 550 Employees of a Company

Frequency Distribution of Incomes of 550 Employees of a Company

Income (Rs.) |
Number of employees |

699.5-799.5 799.5-899.5 899.5-999.5 999.5-1099.5 1099.5-1199.5 1199.5-1299.5 |
150 80 120 130 50 20 |

Total |
550 |

**
**

Actually,

Refers to the number of values in a particular class. The counting of class frequency is done by tally marks against the particular class.

A tally (/) is put against a class for each student whose marks are included in that class. For example, if the marks obtained by a student are 59, we put a tally (/) against class 50-60. If the marks are 61, a tally is put against the class 60-70. If someone obtains 40 marks, a tally is put against the class 40-50. Table 3.6 shows the tally marking of marks of 100 students in mathematics from Table 3.1. The counting of tally is made easier when four of them are put as //// and the fifth tally is placed across them as. Tallies are then counted as groups of five. So if there are 16 tallies in a class, we put them as / for the sake of convenience. Thus frequency in a class is equal to the number of tallies against that class.

# How should we get the frequency for each class?

Actually,

*frequency of an observation means how many times that observation occurs in the raw data.*In our Table 3.6, for each class, number of observations, its corresponding Tally mark, frequency and its class mark.

**TABLE 3.6**

**Tally Marking of Marks of 100 Students in Mathematics**

Class |
Observations |
Tally mark |
Frequency |
Class mark |

0-10 | 0 | 1 | 5 | |

10-20 | 10, 14, 17, 12, 14, 12, 14, 14 | 8 | 15 | |

20-30 | 25, 25, 20, 22, 25, 28 | 6 | 25 | |

30-40 | 30, 37, 34, 39, 32, 30, 35 | 7 | 35 | |

40-50 | 47, 42, 49, 49, 45, 45, 47, 44, 40, 44, 49, 46, 41, 40, 43, 48, 48, 49, 49, 40, 41 | 21 | 45 | |

50-60 | 59, 51, 53, 56, 55, 57, 55, 51, 50, 56, 59, 56, 59, 57, 59, 55, 56, 51, 55, 56, 55, 50, 54 | /// | 23 | 55 |

60-70 | 60, 64, 62, 66, 69, 64, 64, 60, 66, 69, 62, 61, 66, 60, 65, 62, 65, 66, 65 | //// | 19 | 65 |

70-80 |
70, 75, 70, 76, 70, 71 |
/ |
6 | 75 |

80-90 | 82, 82, 82, 80, 85 | 5 | 85 | |

90-100 | 90, 100, 90, 90 | //// | 4 | 95 |

Total |
100 |

**Courtesy**NCERT Text book

Refers to the number of values in a particular class. The counting of class frequency is done by tally marks against the particular class.

# Finding class frequency by tally marking

A tally (/) is put against a class for each student whose marks are included in that class. For example, if the marks obtained by a student are 59, we put a tally (/) against class 50-60. If the marks are 61, a tally is put against the class 60-70. If someone obtains 40 marks, a tally is put against the class 40-50. Table 3.6 shows the tally marking of marks of 100 students in mathematics from Table 3.1. The counting of tally is made easier when four of them are put as //// and the fifth tally is placed across them as. Tallies are then counted as groups of five. So if there are 16 tallies in a class, we put them as / for the sake of convenience. Thus frequency in a class is equal to the number of tallies against that class.

**
**

The classification of data as a frequency distribution has an inherent shortcoming. While it summarises the raw data making it concise and comprehensible, it does not show the details that are found in raw data. There is a loss of information in classifying raw data though much is gained by summarising it as a classified data. If the data are grouped into classes, then an individual observation has no significance in further statistical calculations. In table 3.6, the class 20-30 contains 6 observations: 25, 25, 20, 22, 25 and 28. So when these data are grouped as a class 20-30 in the frequency distribution, the latter provides only the number of records in that class (i.e. frequency = 6) but not their actual values.

In the earlier section, you were learnt about frequency distributions of equal class intervals and you have seen how they are constructed out of raw data. But in some cases frequency distributions with unequal class intervals are more appropriate. If you observe the frequency distribution as in Table 3.6, you will notice that most of the observations are present in classes 40-50, 50-60 and 60-70. Their respective frequencies are 21, 23 and 19. It means that out of 100 observations, 63 (21+23+19) observations are concentrated in these classes. These classes are densely populated with observations. Thus, 63 percent of data lie between 40 and 70. The remaining 37 percent of data are in classes 0-10, 10-20, 20-30, 30-40, 70-80, 80-90 and 90-100. These classes are less populated with observations. Further you will also notice that observations in these classes deviate more from their respective class marks than in comparison to those in other classes. But if classes are to be formed in such a way that class marks coincide, as far as possible, to a value around which the observations in a class tend to more, then in that case unequal class interval is more appropriate. The following Table 3.7 shows the same frequency distribution of Table 3.6 in terms of unequal classes. Each of the classes 40-50, 50-60 and 60-70 are split into

The class marks of the table are plotted on X-axis and the frequencies are plotted on Y-axis.

Till now we have discussed the classification of data for a continuous variable using the example of percentage marks of 100 students in mathematics. For a discrete variable, the classification of its data is known as a

# Loss of Information

The classification of data as a frequency distribution has an inherent shortcoming. While it summarises the raw data making it concise and comprehensible, it does not show the details that are found in raw data. There is a loss of information in classifying raw data though much is gained by summarising it as a classified data. If the data are grouped into classes, then an individual observation has no significance in further statistical calculations. In table 3.6, the class 20-30 contains 6 observations: 25, 25, 20, 22, 25 and 28. So when these data are grouped as a class 20-30 in the frequency distribution, the latter provides only the number of records in that class (i.e. frequency = 6) but not their actual values.

*All values in this class are*

*assumed to be equal to the middle*

*value of the class interval or class*

*mark (i.e. 25). Also statistical*

*calculations are based only on the*

*values of class mark but not based on the*

*values of the observations in that*

*class.*This is true for all other classes as well. Thus the class mark involves more on the loss of information than the actual values of the observations in statistical methods.

# Frequency distribution with unequal classes

In the earlier section, you were learnt about frequency distributions of equal class intervals and you have seen how they are constructed out of raw data. But in some cases frequency distributions with unequal class intervals are more appropriate. If you observe the frequency distribution as in Table 3.6, you will notice that most of the observations are present in classes 40-50, 50-60 and 60-70. Their respective frequencies are 21, 23 and 19. It means that out of 100 observations, 63 (21+23+19) observations are concentrated in these classes. These classes are densely populated with observations. Thus, 63 percent of data lie between 40 and 70. The remaining 37 percent of data are in classes 0-10, 10-20, 20-30, 30-40, 70-80, 80-90 and 90-100. These classes are less populated with observations. Further you will also notice that observations in these classes deviate more from their respective class marks than in comparison to those in other classes. But if classes are to be formed in such a way that class marks coincide, as far as possible, to a value around which the observations in a class tend to more, then in that case unequal class interval is more appropriate. The following Table 3.7 shows the same frequency distribution of Table 3.6 in terms of unequal classes. Each of the classes 40-50, 50-60 and 60-70 are split into

*two*classes. The class 40- 50 is divided into 40-45 and 45-50. The class 50-60 is divided into 50- 55 and 55-60. And class 60-70 is divided into 60-65 and 65-70 and so on. The new classes 40-45, 45-50, 50-55, 55-60, 60-65 and 65-70 have class interval of 5. The other classes: 0-10, 10-20, 20-30, 30-40, 70-80, 80-90 and 90- 100 retain their old class interval of 10. The last column of this table shows the new values of class marks for these classes. Compare them with the old values of class marks in Table 3.6. Notice that the observations in these classes deviated more from their old class mark values than their new class mark values. Thus the new class mark values are more representative of the data in these classes than the old values.

**TABLE 3.7**

Sub Topic: Frequency Distribution of Unequal Classes

Sub Topic: Frequency Distribution of Unequal Classes

Class |
Observation |
Frequency |
Class mark |

0-10 10-20 20-30 30-40 40-45 45-50 50-55 55-60 60-65 65-70 70-80 80-90 90-100 |
0 10, 14, 17, 12, 14, 12, 14, 14 25, 25, 20, 22, 25, 28 30, 37, 34, 39, 32, 30, 35 42, 44, 40, 44, 41, 40, 43, 40, 41 47, 49, 49, 45, 45, 47, 49, 46, 48, 48, 49, 49 51, 53, 51, 50, 51, 50, 54 59, 56, 55, 57, 55, 56, 59, 56, 59, 57, 59, 55, 56, 55, 56, 55 60, 64, 62, 64, 64, 60, 62, 61, 60, 62 66, 69, 66, 69, 66, 65, 65, 66, 65 70, 75, 70, 76, 70, 71 82, 82, 82, 80, 85 90, 100, 90, 90 |
1 8 6 7 9 12 7 16 10 9 6 5 4 |
5 15 25 35 42.5 47.5 52.5 57.5 62.5 67.5 75 85 95 |

Total |
100 |

Figure 3.2 shows the frequency curve of the distribution in Table 3.7.

The class marks of the table are plotted on X-axis and the frequencies are plotted on Y-axis.

**Courtesy** NCERT Text book

# Frequency array

Till now we have discussed the classification of data for a continuous variable using the example of percentage marks of 100 students in mathematics. For a discrete variable, the classification of its data is known as a

*Frequency Array.*Since a discrete variable takes values and not intermediate fractional values between two integral values, we have frequencies that correspond to each of its integral values. The example in Table 3.8 illustrates a

*Frequency Array.*

**TABLE 3.8**

Frequency Array of the Size of Households

Frequency Array of the Size of Households

Type of the House |
Number of Houses |

1 2 3 4 5 6 7 8 |
25 10 14 19 10 10 5 7 |

Total |
100 |

The variable “Type of the house” is a discrete variable that only takes integral values as shown in the table. Since it does not take any fractional value between two adjacent integral values, there are no classes in this frequency array. Since there are no classes in a frequency array there would be no class intervals. As the classes are absent in a discrete frequency distribution, there is no class mark as well.

**
**