Monday, May 06, 2024

Business Statistics - Correlation and Regression (Formulas)

 

Correlation and Regression

      Formulas

 

In measures of central tendency and measures of dispersion we study characteristics of one variable only, e.g. mean of the distribution of heights of the students of a class, standard deviation of weight of the students etc. But, there may arise many such situations in which we may have to study two variables simultaneously. For example, the variables may be –

              (i)        The amount of rainfall and yield of a certain crop,

            (ii)        The height and weight of a group of children,

         (iii)        Income and expenditure of several families,

         (iv)        Ages of husband and wife,

      (v)        Rise / fall of temperature and increase / decrease in sale of cold drinks,

   (vi)   General income level of a country and percentage of literate population of the country, etc.

 

Two variables may be required to be studied simultaneously for the following two objectives:

       1.        To measure numerically the strength of association and nature of relationship between the two variables (This is the problem of correlation), and

 2.    To make estimates or predictions regarding the principal variable when the value of the other variable is known (This is the problem of regression).

 

Thus, in short, correlation is concerned with the measurement of the ‘strength of association’ between the variables; while regression is concerned with the ‘prediction’ of the most likely value of one variable when the value of the other variable is known.

 

In simple correlation (also called linear correlation) the strength of linear type of relationship between the variables is considered. Similarly, in simple regression (also called linear regression) the linear equation between the variables is considered.


Correlation

The word “correlation” is used to denote the degree of association between variables. If two variables ‘x’ and ‘y’ are so related that variations in the magnitude of one variable tend to be accompanied by variations in the magnitude of the other variable, they are said to be correlated. If ‘y’ tends to increase as ‘x’ increases, the variables are said to be positively correlated. If ‘y’ tends to decrease as ‘x’ increases, the variables are said to be negatively correlated. If the values of ‘y’ are not affected by changes in the values of ‘x’, the variables are said to be uncorrelated.

 

The linear correlation or simple correlation (i.e. the degree of association between two variables) is measured by Correlation Coefficient.

 

Formulas of Correlation

 

Pearson’s formulas for Correlation Coefficient (r):

 DIRECT METHOD

If ‘x’ and ‘y’ are two variables, the Correlation Coefficient between them, r –

1

r =

[cov(x, y)] ÷ σxσy

Where,

(i) σx = Standard Deviation of x-series

(ii) σy = Standard Deviation of y-series

(iii) cov(x, y)

= 1/n[∑(x – Mean of x)(y – Mean of y)]

= [(∑xy)/n] – [{(∑x)/n}.{(∑y)/n}]

2

r =

[∑(x – Mean of x)(y – Mean of y)] ÷ [∑{(x – Mean of x)^2}.∑{(y – Mean of y)^2}]^(1/2)

3

r =

[∑xy – n(Mean of x)(Mean of y)] ÷ [{∑x^2 – n(Mean of x)^2}.{∑y^2 – n(Mean of y)^2}]^(1/2)

4

r =

[n∑xy – (∑x).(∑y)] ÷ [{n∑x^2 – (∑x)^2}.{n∑y^2 – (∑y)^2}]^(1/2)


 SHORT-CUT METHOD

If ‘x’ and ‘y’ are two variables, and X = x – c and Y = y – d (where, c and d are constants), the Correlation Coefficient between ‘x’ and ‘y’ –

rxy = rXY

 

 STEP DEVIATION METHOD

If ‘x’ and ‘y’ are two variables, and u = (x – a)/b and v = (y – c)/d (where, a, b, c and d are constants), the Correlation Coefficient between ‘x’ and ‘y’ –

rxy = ± ruv

(According as b and d have the same sign, or opposite signs)

 

Spearman’s formulas for Correlation

Coefficient (r):

RANK CORRELATION

Formula: 1

r =

1 – [{6∑(d^2)} ÷ {(n^3) – n}]

 

Where,

d =  Differences of the ranks of the respective individual observations, and

n =  Number of individual observations

 

Formula: 2

r =

1 – [6{∑(d^2) + 1/12∑{(m^3) – m}} ÷ {(n^3) – n}]

 

Where,

d =  Differences of the ranks of the respective individual observations,

n =  Number of individual observations, and

m = Number of individual observations involved in a tie whether in the first or second series.

 

 

Important Properties of Correlation Coefficient

 

1.       The correlation coefficient ‘r’ is independent of the change of both origin and scale of the observations. Owing to this property, If ‘x’ and ‘y’ are two variables, and u = (x – a)/b and v = (y – c)/d (where, a, b, c and d are constants), the Correlation Coefficient between ‘x’ and ‘y’, rxy = ± ruv

(According as b and d have the same sign, or opposite signs)

2.       The correlation coefficient ‘r’ is a pure number and is independent of the units of measurement.

3.       The correlation coefficient ‘r’ lies between (− 1) and (+ 1); i.e. ‘r’ cannot exceed 1 numerically. That is, mathematically,

(− 1) ≤ ‘r’ ≤ (+ 1)

 

Regression

The word “regression” is used to denote estimation or prediction of the average value of one variable for a specified value of the other variable. The estimation is done by means of suitable equations, derived on the basis of available bivariate data. Such an equation is known as a Regression Equation.

In linear regression (or simple regression) the relationship between the variables is assumed to be linear.

 

Formulas of Regression

Regression Equation by the Method of Normal Equations

(i) For Regression Equation of X on Y i.e. for Regression Equation: X = a + bY,

     Normal Equations are:

∑X = na + b∑Y and

∑XY = a∑Y + b∑(Y^2)

(ii) For Regression Equation of Y on X i.e. for Regression Equation: Y = a + bX,

      Normal Equations are:

∑Y = na + b∑X and

∑XY = a∑X + b∑(X^2)

 

Regression Equation by the Method of Regression Coefficients:

(i) Regression Equation of X on Y:

X − Mean of X = bXY (Y − Mean of Y)

(ii) Regression Equation of Y on X:

Y − Mean of Y = bYX (X − Mean of X)

 

Here, bXY and bYX are called Regression Coefficients. These Regression Coefficients can be calculated by the following formulas:

 

bXY

1.

= [(∑XY)/n − {(∑X)/n}{(∑Y)/n}] ÷ [(∑Y^2)/n − {(∑Y)/n}^2]

2.

= [n∑XY − (∑X)(∑Y)] ÷ [n∑Y^2 − (∑Y)^2]

3.

= [Cov(X,Y)] ÷ [(σY)^2]

4.

= r [(σX)/(σY)]

5.

= (h/k)buv

 

Where,

buv = [n∑uv − (∑u)(∑v)] ÷ [n∑v^2 − (∑v)^2]

u = (X − Assumed Mean)/h,

v = (Y − Assumed Mean)/k, and

‘h’ and ‘k’ are constants.

 

bYX

1.

= [(∑XY)/n − {(∑X)/n}{(∑Y)/n}] ÷ [(∑X^2)/n − {(∑X)/n}^2]

2.

= [n∑XY − (∑X)(∑Y)] ÷ [n∑X^2 − (∑X)^2]

3.

= [Cov(X,Y)] ÷ [(σX)^2]

4.

= r [(σY)/(σX)]

5.

= (k/h)bvu

 

Where,

bvu = [n∑uv − (∑u)(∑v)] ÷ [n∑u^2 − (∑u)^2]

u = (X − Assumed Mean)/h,

v = (Y − Assumed Mean)/k, and

‘h’ and ‘k’ are constants.

 

Important Properties of Linear Regression

 

1.       The product of the two regression coefficients is equal to the square of correlation coefficients. Mathematically, (bYX) × (bXY) = r^2.

2.       r, bYX and bXY all have the same sign. If the correlation coefficient ‘r’ is zero, the regression coefficients bYX and bXY are also zero.

3.       The regression lines always intersect at the point (Mean of X, Mean of Y). The slopes of the regression line of Y on X and the regression line of X on Y are respectively bYX and 1/bXY.

4.       The angle between the two regression lines depends on the correlation coefficient ‘r’. When r = 0, the two lines are perpendicular to each other; when r = + 1, or r = − 1, they coincide. As ‘r’ increases numerically from 0 to 1, the angle between the regression lines diminishes from 900 to 00.

 

Other Important Formulas

1.

Coefficient of determination = r^2

2.

Coefficient of non-determination = 1 – r^2

3.

Coefficient of concurrent deviation,

rc = ± [± {(2c – m)/m}]^(1/2)

Where,

c = No. of concurrent deviations (i.e., No. of (+)ve signs in the product of deviation column.

m = Total number of deviations (i.e., 1 less than the number of pairs).

 

Important notes:

1. If (2c – m) > 0, both outside and inside the square root the sign will be (+)ve, and

2. If (2c – m) < 0, both outside and inside the square root the sign will be (−)ve.

 

No comments:

Post a Comment