Correlation and Regression
–
Formulas
In measures of
central tendency and measures of dispersion we study characteristics of one
variable only, e.g. mean of the distribution of heights of the students of a
class, standard deviation of weight of the students etc. But, there may arise
many such situations in which we may have to study two variables
simultaneously. For example, the variables may be –
(i)
The
amount of rainfall and yield of a certain crop,
(ii)
The
height and weight of a group of children,
(iii)
Income
and expenditure of several families,
(iv)
Ages
of husband and wife,
(v)
Rise
/ fall of temperature and increase / decrease in sale of cold drinks,
(vi) General
income level of a country and percentage of literate population of the country,
etc.
Two variables
may be required to be studied simultaneously for the following two objectives:
1. To measure numerically the strength of association and nature of relationship between the two variables (This is the problem of correlation), and
2. To make estimates or predictions regarding the principal variable when the value of the other variable is known (This is the problem of regression).
Thus, in
short, correlation is concerned with the measurement of the ‘strength of
association’ between the variables; while regression is concerned with the
‘prediction’ of the most likely value of one variable when the value of the
other variable is known.
In simple
correlation (also called linear correlation) the strength of linear type of
relationship between the variables is considered. Similarly, in simple
regression (also called linear regression) the linear equation between the
variables is considered.
Correlation
The word
“correlation” is used to denote the degree
of association between variables. If two variables ‘x’ and ‘y’ are so
related that variations in the magnitude of one variable tend to be accompanied
by variations in the magnitude of the other variable, they are said to be correlated. If ‘y’ tends to increase as
‘x’ increases, the variables are said to be positively
correlated. If ‘y’ tends to decrease as ‘x’ increases, the variables are
said to be negatively correlated. If
the values of ‘y’ are not affected by changes in the values of ‘x’, the
variables are said to be uncorrelated.
The linear
correlation or simple correlation (i.e. the degree of association between two
variables) is measured by Correlation
Coefficient.
Formulas of Correlation
Pearson’s formulas for Correlation
Coefficient (r):
DIRECT METHOD
If
‘x’ and ‘y’ are two variables, the Correlation Coefficient between them, r – |
|||||||
1 |
r
= |
[cov(x,
y)] ÷ σxσy Where,
|
|||||
2 |
r
= |
[∑(x
– Mean of x)(y – Mean of y)] ÷ [∑{(x – Mean of x)^2}.∑{(y – Mean of
y)^2}]^(1/2) |
|||||
3 |
r
= |
[∑xy
– n(Mean of x)(Mean of y)] ÷ [{∑x^2 – n(Mean of x)^2}.{∑y^2 – n(Mean of
y)^2}]^(1/2) |
|||||
4 |
r
= |
[n∑xy
– (∑x).(∑y)] ÷ [{n∑x^2 – (∑x)^2}.{n∑y^2 – (∑y)^2}]^(1/2) |
SHORT-CUT METHOD
If
‘x’ and ‘y’ are two variables, and X = x – c and Y = y – d (where, c and d
are constants), the Correlation Coefficient between ‘x’ and ‘y’ – rxy = rXY |
STEP DEVIATION METHOD
If
‘x’ and ‘y’ are two variables, and u = (x – a)/b and v = (y – c)/d (where, a,
b, c and d are constants), the Correlation Coefficient between ‘x’ and ‘y’ – rxy = ± ruv (According
as b and d have the same sign, or opposite signs) |
Spearman’s formulas for Correlation
Coefficient (r):
RANK CORRELATION
Formula: 1 |
|
r
= |
1
– [{6∑(d^2)} ÷ {(n^3) – n}] |
|
Where, d = Differences of the ranks of the respective
individual observations, and n = Number of individual observations |
Formula: 2 |
|
r
= |
1
– [6{∑(d^2) + 1/12∑{(m^3)
– m}} ÷ {(n^3) – n}] |
|
Where, d = Differences of the ranks of the respective
individual observations, n = Number of individual observations, and m = Number of individual
observations involved in a tie whether in the first or second series. |
Important Properties of Correlation
Coefficient
1.
The
correlation coefficient ‘r’ is independent of the change of both origin and
scale of the observations. Owing to this property, If ‘x’ and ‘y’ are two
variables, and u = (x – a)/b and v = (y – c)/d (where, a, b, c and d are
constants), the Correlation Coefficient between ‘x’ and ‘y’, rxy
= ± ruv
(According as b and d have the same sign, or opposite signs)
2.
The
correlation coefficient ‘r’ is a pure number and is independent of the units of
measurement.
3.
The
correlation coefficient ‘r’ lies between (− 1) and (+ 1); i.e. ‘r’ cannot
exceed 1 numerically. That is, mathematically,
(− 1) ≤ ‘r’ ≤ (+ 1)
Regression
The word “regression” is used to denote estimation or
prediction of the average value of one variable for a specified value of the
other variable. The estimation is done by means of suitable equations, derived
on the basis of available bivariate data. Such an equation is known as a Regression Equation.
In linear regression (or simple regression) the
relationship between the variables is assumed to be linear.
Formulas of Regression
Regression Equation by the Method of
Normal Equations
(i) For
Regression Equation of X on Y i.e. for Regression Equation: X = a + bY, Normal Equations are: |
∑X
= na + b∑Y and |
∑XY
= a∑Y + b∑(Y^2) |
(ii) For
Regression Equation of Y on X i.e. for Regression Equation: Y = a + bX, Normal Equations are: |
∑Y
= na + b∑X and |
∑XY
= a∑X + b∑(X^2) |
Regression Equation by the Method of
Regression Coefficients:
(i) Regression
Equation of X on Y: |
X
− Mean of X = bXY (Y − Mean of Y) |
(ii) Regression
Equation of Y on X: |
Y
− Mean of Y = bYX (X − Mean of X) |
Here, bXY
and bYX are called Regression Coefficients. These Regression
Coefficients can be calculated by the following formulas:
bXY
1. |
= [(∑XY)/n − {(∑X)/n}{(∑Y)/n}] ÷ [(∑Y^2)/n −
{(∑Y)/n}^2] |
2. |
=
[n∑XY − (∑X)(∑Y)] ÷ [n∑Y^2 − (∑Y)^2] |
3. |
=
[Cov(X,Y)] ÷ [(σY)^2] |
4. |
=
r [(σX)/(σY)] |
5. |
=
(h/k)buv |
|
Where, buv = [n∑uv −
(∑u)(∑v)] ÷ [n∑v^2 − (∑v)^2] u = (X − Assumed Mean)/h, v = (Y − Assumed Mean)/k, and ‘h’ and ‘k’ are constants. |
bYX
1. |
= [(∑XY)/n − {(∑X)/n}{(∑Y)/n}] ÷ [(∑X^2)/n − {(∑X)/n}^2] |
2. |
=
[n∑XY − (∑X)(∑Y)] ÷ [n∑X^2 − (∑X)^2] |
3. |
=
[Cov(X,Y)] ÷ [(σX)^2] |
4. |
=
r [(σY)/(σX)] |
5. |
=
(k/h)bvu |
|
Where, bvu = [n∑uv −
(∑u)(∑v)] ÷ [n∑u^2 − (∑u)^2] u = (X − Assumed Mean)/h, v = (Y − Assumed Mean)/k, and ‘h’ and ‘k’ are constants. |
Important Properties of Linear
Regression
1.
The
product of the two regression coefficients is equal to the square of
correlation coefficients. Mathematically, (bYX) × (bXY) =
r^2.
2.
r,
bYX and bXY all have the same sign. If the correlation
coefficient ‘r’ is zero, the regression coefficients bYX and bXY
are also zero.
3.
The
regression lines always intersect at the point (Mean of X, Mean of Y). The
slopes of the regression line of Y on X and the regression line of X on Y are
respectively bYX and 1/bXY.
4.
The
angle between the two regression lines depends on the correlation coefficient
‘r’. When r = 0, the two lines are perpendicular to each other; when r = + 1,
or r = − 1, they coincide. As ‘r’ increases numerically from 0 to 1, the angle
between the regression lines diminishes from 900 to 00.
Other Important Formulas
1. |
Coefficient
of determination = r^2 |
2. |
Coefficient
of non-determination = 1 – r^2 |
3. |
Coefficient
of concurrent deviation, rc = ± [± {(2c – m)/m}]^(1/2) Where, c
= No. of concurrent deviations (i.e., No. of (+)ve signs in the product of
deviation column. m
= Total number of deviations (i.e., 1 less than the number of pairs). Important notes: 1.
If (2c – m) > 0, both outside and inside the square root the sign will be
(+)ve, and 2.
If (2c – m) < 0, both outside and inside the square root the sign will be
(−)ve. |
No comments:
Post a Comment