Regression Analysis
The statistical technique that expresses a functional relationship between two or more variables in the form of an equation to estimate the value of a variable, based on the given value of another variable is called regression analysis.
The variable whose value is to be estimated is called dependent variable and the variable whose value is used to estimate this value is called independent variable.
The linear algebraic equations that express a dependent variable in terms of an independent variable are called linear regression equation.
For example, if the sales and advertising expenses for a product are correlated, then by regression analysis the dependent variable (sales in this case) can be estimated for a given value of the independent variable (Advertising expenses in this case).
For a bivariate data y on x, the regression equation obtained with the assumption that y is dependent on x is called regression of y on x and it is given by
y = a + bx
For a bivariate data x on y, the regression equation obtained with the assumption that x is dependent on y is called regression of x on y and it is given by
x = a + by
We can clearly see that, the above regression equations are similar to the equation of a straight line.
Keeping this similarity in mind, â€˜aâ€™ will be the intercept and â€˜bâ€™ will be the slope of the line represented by the equations. The values of â€˜aâ€™ and â€˜bâ€™ can be found by solving the following pair of equations simultaneously.
Where,
n is the number of (x, y) pairs
âˆ‘x is the sum of all x values
âˆ‘y is the sum of all y values
âˆ‘xy is the sum of the product of x and corresponding y values
âˆ‘y^{2 }is the sum of squares of y values
âˆ‘x^{2 }is the sum of squares of x values
The regression equation of x on y can also be expressed as follows:
and that of y on x can be expressed as,
where,
the constants b_{xy} and b_{yx} are called as the regression coefficients
are arithmetic means of x and y values respectively.
b_{xy} and b_{yx}_{ }can be calculated using the expressions given below:
X | 2 | 4 | 5 | 5 | 8 | 10 |
Y | 6 | 7 | 9 | 10 | 12 | 12 |
x | y | |
Mean | 98 | 28 years |
Standard deviation | 2 | 4 years |
Note: The regression equation of x on y is used for the estimation of x values and the regression equation of y on x is used for the estimation of y values.
Properties of Regression Coefficient
- The product of both the regression coefficient gives us the coefficient of correlation r^{2} = b_{yx} Ã— b_{xy}
- where, r is the coefficient of correlation, b_{yx} and b_{xy} are the regression coefficients
- Since coefficient of correlation, numerically, cannot be greater than 1, the product of regression coefficients cannot be greater than 1
- Regression coefficients will have same sign as that of r
- The average value of the two regression coefficients would be greater than the value of coefficient of correlation
- Regression coefficients are independent of change of origin, but not scale
Properties of Regression Lines
- The two lines of regression intersect at the average values of x and y
- If there is perfect correlation r = Â±1, the regression lines coincide
- The angle â€˜Aâ€™ between the regression lines is given by, tan
- Regression lines are perpendicular to each other, when r = 0