Techniques for Measuring Correlation
The commonly used methods for the study of correlation are: Scatter diagrams,
 Karl Pearson's coefficient of correlation and
 Spearman's rank correlation.
A numerical measure of association between two variables (that can be represented by a straight line on a graph) is given by Karl Pearson's coefficient of correlation. Spearman's coefficient of correlation is another measure, which gives the linear relationship between ranks assigned to indiviual items according to their attributes. Attributes are those variables which cannot be numerically measured such as beauty, capability, honesty, dependability, etc.
Scatter Diagram
A type of diagram which shows the association between two variables which seem to have a relationship. One variable is represented along the xaxis and the other along the yaxis. Each data set is then represented by a single point.The cluster of points, so plotted, is referred to as a scatter diagram. From a scatter diagram, one can get a fairly good idea of the nature of relationship.
Scatter diagrams show one of six possible correlations between the variables:
 Strong Positive Correlation The value of Y clearly increases as the value of X increases.
 Strong Negative Correlation The value of Y clearly decreases as the value of X increases.
 Weak Positive Correlation The value of Y increases slightly as the value of X increases
 Weak Negative Correlation The value of Y decreases slightly as the value of X increases.
 Complex Correlation The value of Y seems to be related to the value of X, but the relationship is not easily determined.
No Correlation There is no demonstrated connection between the two variables.
Activity: Collect the following data from 10 of your classmates
Name 
Height (in m) 
Time (in s) for 50m race 
Long jump (in feet) 
High jump (in feet) 
Then plot a graph for each pair of data sets and see if you can comment on their correlation by looking at the scatter diagram.
Karl Pearson's Coefficient of Correlation
Pearson's product moment correlation coefficient, usually denoted by r, is a measure of the linear association between two variables X and Y, that have been measured on interval or ratio scales, such as the relationship between height in metres and weight in kilograms. However, it can be misleading when the relationship between the variables is not linear.The linear relationship may be given by
This is the equation of a straight line which has yintercept equal to and slope equal to . If the relation cannot be represented by a straight line as above
and we have a relationship that can be written as
the value of the coefficient will be zero. It clearly shows that zero correlation
need not mean absence of any type of relation between the two variables.
Let be n values of and be the corresponding
values of . The arithmetic means of and are defined as
and
and their variances are as follows.
and
The standard deviations of X and Y respectively are the positive square roots of their variances. The standard deviations are always positive.
Covariance of X and Y is defined as
are the deviations of the ith value of and from their mean values respectively. The sign of covariance between and determines the sign of the correlation coefficient.
If the covariance is zero, the correlation coefficient is always zero. The product moment correlation or the Karl Pearson's measure of correlation is given by
Example: In this example shown below we want to calculate the correlation between the marks of some students with the number of hours spent in front of TV per week.
Marks in the test X_{i} 
Hours of TV per week, Y_{i} 
106 
10 
56 
40 
100 
17 
101 
15 
99 
20 
103 
12 
97 
21 
113 
7 
112 
8 
110 
9 
Have a look at the scatter plot , what do you notice?
Now let us calculate the correlation coefficient by Karl Pearson's Correlation Coefficient formula.
Marks in the test X_{i} 
Hours of TV per week, Y_{i} 





106 
10 
28.3 
17.2 
800.89 
295.84 
486.76 
56 
40 
21.7 
12.8 
470.89 
163.84 
277.76 
60 
37 
17.7 
9.8 
313.29 
96.04 
173.46 
71 
25 
6.7 
2.2 
44.89 
4.84 
14.74 
49 
41 
28.7 
13.8 
823.69 
190.44 
396.06 
103 
12 
25.3 
15.2 
640.09 
231.04 
384.56 
97 
21 
19.3 
6.2 
372.49 
38.44 
119.66 
83 
27 
5.3 
0.2 
28.09 
0.04 
1.06 
42 
50 
35.7 
22.8 
1274.49 
519.84 
813.96 
110 
9 
32.3 
18.2 
1043.29 
331.24 
587.86 
This shows that there is a strong negative correlation (r = 0.98) between marks and number hours in a week spent watching TV.
Properties of Correlation Coefficient
The following are some noteworthy properties of the correlation coefficient: r has no unit. It is a pure number free of units of measurement
 A negative value of r indicates a negative correlation. A positive value indicates that the two variables move in the same direction.
 If r = 1 or r = 1 the correlation is perfect. The relation between them is exact.
 A high value of r indicates strong linear relationship. Its value is said to be high when it is close to +1 or 1.
 A low value of r indicates a weak linear relation. Its value is said to be low when it is close to zero.
 The value of the correlation coefficient lies between minus one and plus one,
1 â‰¤ r â‰¤ 1.(If, in an exercise, the value of r is outside this range it indicates error in calculation.)
 The value of r is unaffected by the change of origin and change of scale. Given two variables X and Y let us define two new variables.
Step deviation method to calculate correlation coefficient
Since r is independent of change in origin and scale, for large values of the variables, the cumbersome calculations can be reduced by using the last property of r. It involves the transformation of the variables X and Y as follows:Example : Calculate the correlation coefficient for the following data using step deviation method.
Price index 
120 
150 
190 
220 
230 
Money supply in crores 
1800 
2000 
2500 
2700 
3000 
Solution : Since the values of X and Y are large here, we calculate A and B such that we can transform X and Y into new variables without affecting the correlation.
X 
dx 
dx' 
dx'^{2} 
Y 
dy 
dy' 
dy'^{2} 
dx'dy' 
120 150 220 230 
70 40 0 30 40 
7 4 0 3 4 
49 16 0 9 16 
1800 2000 2700 3000 
700 500 0 +200 +500 
7 5 0 +2 +5 
49 25 0 4 25 
49 20 0 6 20 
Let the transformed variables be
It shows that there is strong positive correlation between price index and money supply.
Spearman's rank correlation
Spearman developed the rank correlation formula for variables that cannot be quantified as in the case of salary, height, number of children, etc. Ranking is usedwhen the variables are not quantifiable. For example where we are required to give measurements for beauty, melody, rhythm, etc. (as in the case of reality shows on television)
There are also situations when you are required to quantify qualities such as truthfulness, honesty, perseverance, compatibility etc. (as in the case of an office or college). Ranking is considered to be a better alternative to quantification of qualities.
Under circumstances with extreme values where thecorrelation coefficient changes, rank correlation provides a better alternative to simple correlation. The interpretation for rank correlation coefficient and simple correlation coefficient remain the same. The formula for rank correlation coefficient is derived from simple correlation
coefficient where individual values are replaced by ranks.
The rank coefficient gives us a measure of linear relationship between ranks assigned to these units and not their values. It is also called the Product Moment Correlation between the ranks.
The formula is
This correction is needed for all repeated values of both variables. If values are repeated, there will be a correction for all values. All the properties of the simple correlation coefficient are applicable here. It lies between 1 and 1. However, generally it is not as accurate as the ordinary method. This is because ranks are used and all the information concerning the data is not taken into consideration.
Example: Given below is the percentage of marks secured by 5 students in Economics and Statistics:
Student 
A 
B 
C 
D 
E 
Marks in Economis 
60 
48 
49 
50 
55 
Marks in statistics 
85 
60 
55 
65 
75 
Calculate the coefficient of rank correlation.
Solution: Let us assign ranks to the marks obtained by the students in each of the subjects:
Marks in Eco. (X) 
R_{1} 
Marks in Stat. (Y) 
R_{2} 
D=R_{1}R_{2} 
D^{2} 
60 48 49 50 55 
1 5 4 3 2 
85 60 55 65 75 
1 4 5 3 2 
0 +1 1 0 0 
0 1 1 0 0 
N = 5 
Using Spearman's Rank Correlation Coefficient
It indicates that there is high degree of relationship between the marks in Economics and statistics.