Loading....
Coupon Accepted Successfully!

 

Techniques for Measuring Correlation

The commonly used methods for the study of correlation are:
  • Scatter diagrams,
  • Karl Pearson's coefficient of correlation and
  • Spearman's rank correlation.
A scatter diagram can be used to speculate about cause-and-effect relationships of variables and to search for root causes of an identified problem. Although it gives us an idea about the correlation, it fails to assign any numerical values to the relationship.

A numerical measure of association between two variables (that can be represented by a straight line on a graph) is given by Karl Pearson's coefficient of correlation. Spearman's coefficient of correlation is another measure, which gives the linear relationship between ranks assigned to indiviual items according to their attributes. Attributes are those variables which cannot be numerically measured such as beauty, capability, honesty, dependability, etc.

Scatter Diagram

A type of diagram which shows the association between two variables which seem to have a relationship. One variable is represented along the x-axis and the other along the y-axis. Each data set is then represented by a single point.

The cluster of points, so plotted, is referred to as a scatter diagram. From a scatter diagram, one can get a fairly good idea of the nature of relationship.

Scatter diagrams show one of six possible correlations between the variables:
  1. Strong Positive Correlation The value of Y clearly increases as the value of X increases.
  2. Strong Negative Correlation The value of Y clearly decreases as the value of X increases.
  3. Weak Positive Correlation The value of Y increases slightly as the value of X increases
  4. Weak Negative Correlation The value of Y decreases slightly as the value of X increases.
  5. Complex Correlation The value of Y seems to be related to the value of X, but the relationship is not easily determined.

    No Correlation There is no demonstrated connection between the two variables.

Activity: Collect the following data from 10 of your classmates
 

Name

Height (in m)

Time (in s) for 50m race

Long jump

(in feet)

High jump

(in feet)

         

Then plot a graph for each pair of data sets and see if you can comment on their correlation by looking at the scatter diagram.

Karl Pearson's Coefficient of Correlation

Pearson's product moment correlation coefficient, usually denoted by r, is a measure of the linear association between two variables X and Y, that have been measured on interval or ratio scales, such as the relationship between height in metres and weight in kilograms. However, it can be misleading when the relationship between the variables is not linear.

The linear relationship may be given by

This is the equation of a straight line which has y-intercept equal to and slope equal to . If the relation cannot be represented by a straight line as above

and we have a relationship that can be written as

the value of the coefficient will be zero. It clearly shows that zero correlation

need not mean absence of any type of relation between the two variables.

Let be n values of and be the corresponding

values of . The arithmetic means of and are defined as
and

and their variances are as follows.
 and

The standard deviations of X and Y respectively are the positive square roots of their variances. The standard deviations are always positive.

Covariance of X and Y is defined as


are the deviations of the ith value of and from their mean values respectively. The sign of covariance between and determines the sign of the correlation coefficient.

If the covariance is zero, the correlation coefficient is always zero. The product moment correlation or the Karl Pearson's measure of correlation is given by


Example: In this example shown below we want to calculate the correlation between the marks of some students with the number of hours spent in front of TV per week.
 

Marks in the test Xi

Hours of TV per week, Yi

106

10

56

40

100

17

101

15

99

20

103

12

97

21

113

7

112

8

110

9


Have a look at the scatter plot , what do you notice?


 

We can see that there is strong negative correlation between the marks obtained and the number of hours spent watching TV.

Now let us calculate the correlation coefficient by Karl Pearson's Correlation Coefficient formula.

 

Marks in the test Xi

Hours of TV per week, Yi

 

 

 

 

 

 

 

 

 

 

 

 

 

106

10

28.3

-17.2

800.89

295.84

-486.76

56

40

-21.7

12.8

470.89

163.84

-277.76

60

37

-17.7

9.8

313.29

96.04

-173.46

71

25

-6.7

-2.2

44.89

4.84

14.74

49

41

-28.7

13.8

823.69

190.44

-396.06

103

12

25.3

-15.2

640.09

231.04

-384.56

97

21

19.3

-6.2

372.49

38.44

-119.66

83

27

5.3

-0.2

28.09

0.04

-1.06

42

50

-35.7

22.8

1274.49

519.84

-813.96

110

9

32.3

-18.2

1043.29

331.24

-587.86

 


This shows that there is a strong negative correlation (r = -0.98) between marks and number hours in a week spent watching TV.

Properties of Correlation Coefficient

The following are some noteworthy properties of the correlation coefficient:
  • r has no unit. It is a pure number free of units of measurement
  • A negative value of r indicates a negative correlation. A positive value indicates that the two variables move in the same direction.
  • If r = 1 or r = -1 the correlation is perfect. The relation between them is exact.
  • A high value of r indicates strong linear relationship. Its value is said to be high when it is close to +1 or -1.
  • A low value of r indicates a weak linear relation. Its value is said to be low when it is close to zero.
  • The value of the correlation coefficient lies between minus one and plus one,
    -1 ≤ r ≤ 1.

    (If, in an exercise, the value of r is outside this range it indicates error in calculation.)

  • The value of r is unaffected by the change of origin and change of scale. Given two variables X and Y let us define two new variables.
This property can be used to calculate the correlation coefficient of a data set in a simplified manner.

Step deviation method to calculate correlation coefficient

Since r is independent of change in origin and scale, for large values of the variables, the cumbersome calculations can be reduced by using the last property of r. It involves the transformation of the variables X and Y as follows:

Example : Calculate the correlation coefficient for the following data using step deviation method.
 

Price index

120

150

190

220

230

Money supply in crores

1800

2000

2500

2700

3000

 

Solution : Since the values of X and Y are large here, we calculate A and B such that we can transform X and Y into new variables without affecting the correlation.

 

X

dx

dx'

dx'2

Y

dy

dy'

dy'2

dx'dy'

120

150

220

230

-70

-40

0

30

40

-7

-4

0

3

4

49

16

0

9

16

1800

2000

2700

3000

-700

-500

0

+200

+500

-7

-5

0

+2

+5

49

25

0

4

25

49

20

0

6

20

   

   

 

Let the transformed variables be

It shows that there is strong positive correlation between price index and money supply.

Spearman's rank correlation

Spearman developed the rank correlation formula for variables that cannot be quantified as in the case of salary, height, number of children, etc. Ranking is used

when the variables are not quantifiable. For example where we are required to give measurements for beauty, melody, rhythm, etc. (as in the case of reality shows on television)

There are also situations when you are required to quantify qualities such as truthfulness, honesty, perseverance, compatibility etc. (as in the case of an office or college). Ranking is considered to be a better alternative to quantification of qualities.

Under circumstances with extreme values where thecorrelation coefficient changes, rank correlation provides a better alternative to simple correlation. The interpretation for rank correlation coefficient and simple correlation coefficient remain the same. The formula for rank correlation coefficient is derived from simple correlation

coefficient where individual values are replaced by ranks.

The rank coefficient gives us a measure of linear relationship between ranks assigned to these units and not their values. It is also called the Product Moment Correlation between the ranks.

The formula is



This correction is needed for all repeated values of both variables. If values are repeated, there will be a correction for all values. All the properties of the simple correlation coefficient are applicable here. It lies between 1 and -1. However, generally it is not as accurate as the ordinary method. This is because ranks are used and all the information concerning the data is not taken into consideration.

Example: Given below is the percentage of marks secured by 5 students in Economics and Statistics:
 

Student

A

B

C

D

E

Marks in Economis

60

48

49

50

55

Marks in statistics

85

60

55

65

75


Calculate the coefficient of rank correlation.

Solution: Let us assign ranks to the marks obtained by the students in each of the subjects:
 

Marks in Eco. (X)

R1

Marks in Stat. (Y)

R2

D=R1-R2

D2

60

48

49

50

55

1

5

4

3

2

85

60

55

65

75

1

4

5

3

2

0

+1

-1

0

0

0

1

1

0

0

N = 5

     


Using Spearman's Rank Correlation Coefficient
It indicates that there is high degree of relationship between the marks in Economics and statistics.




Test Your Skills Now!
Take a Quiz now
Reviewer Name