Monday, May 06, 2024

Business Statistics - Correlation and Regression (Formulas and Selected Problems and Solutions)

 

Business Statistics

Correlation and Regression

(Formulas and Selected Problems and Solutions)

 

Part A:

In Part A you will find -

1. Discussion about what is 'correlation' and 'correlation coefficient';

2. Discussion about what is 'regression' and 'regression equation';

3. Formulas for calculating 'correlation coefficient' including 'rank correlation coefficient';

4. Important properties of correlation coefficient;

5. Formulas and Methods for finding the regression equations; and

6. Important properties of linear regression.


Part B:

In Part B you will find twenty-six selected problems with solutions.



Part A


In measures of central tendency and measures of dispersion we study characteristics of one variable only, e.g. mean of the distribution of heights of the students of a class, standard deviation of weight of the students etc. But, there may arise many such situations in which we may have to study two variables simultaneously. For example, the variables may be –

              (i)        The amount of rainfall and yield of a certain crop,

            (ii)        The height and weight of a group of children,

         (iii)        Income and expenditure of several families,

         (iv)        Ages of husband and wife,

      (v)        Rise / fall of temperature and increase / decrease in sale of cold drinks,

   (vi)   General income level of a country and percentage of literate population of the country, etc.

 

Two variables may be required to be studied simultaneously for the following two objectives:

       1.        To measure numerically the strength of association and nature of relationship between the two variables (This is the problem of correlation), and

 2.    To make estimates or predictions regarding the principal variable when the value of the other variable is known (This is the problem of regression).

 

Thus, in short, correlation is concerned with the measurement of the ‘strength of association’ between the variables; while regression is concerned with the ‘prediction’ of the most likely value of one variable when the value of the other variable is known.

 

In simple correlation (also called linear correlation) the strength of linear type of relationship between the variables is considered. Similarly, in simple regression (also called linear regression) the linear equation between the variables is considered.


Correlation

The word “correlation” is used to denote the degree of association between variables. If two variables ‘x’ and ‘y’ are so related that variations in the magnitude of one variable tend to be accompanied by variations in the magnitude of the other variable, they are said to be correlated. If ‘y’ tends to increase as ‘x’ increases, the variables are said to be positively correlated. If ‘y’ tends to decrease as ‘x’ increases, the variables are said to be negatively correlated. If the values of ‘y’ are not affected by changes in the values of ‘x’, the variables are said to be uncorrelated.

 

The linear correlation or simple correlation (i.e. the degree of association between two variables) is measured by Correlation Coefficient.

 

Formulas of Correlation

 

Pearson’s formulas for Correlation Coefficient (r):

 DIRECT METHOD

If ‘x’ and ‘y’ are two variables, the Correlation Coefficient between them, r –

1

r =

[cov(x, y)] ÷ σxσy

Where,

(i) σx = Standard Deviation of x-series

(ii) σy = Standard Deviation of y-series

(iii) cov(x, y)

= 1/n[∑(x – Mean of x)(y – Mean of y)]

= [(∑xy)/n] – [{(∑x)/n}.{(∑y)/n}]

2

r =

[∑(x – Mean of x)(y – Mean of y)] ÷ [∑{(x – Mean of x)^2}.∑{(y – Mean of y)^2}]^(1/2)

3

r =

[∑xy – n(Mean of x)(Mean of y)] ÷ [{∑x^2 – n(Mean of x)^2}.{∑y^2 – n(Mean of y)^2}]^(1/2)

4

r =

[n∑xy – (∑x).(∑y)] ÷ [{n∑x^2 – (∑x)^2}.{n∑y^2 – (∑y)^2}]^(1/2)


 SHORT-CUT METHOD

If ‘x’ and ‘y’ are two variables, and X = x – c and Y = y – d (where, c and d are constants), the Correlation Coefficient between ‘x’ and ‘y’ –

rxy = rXY

 

 STEP DEVIATION METHOD

If ‘x’ and ‘y’ are two variables, and u = (x – a)/b and v = (y – c)/d (where, a, b, c and d are constants), the Correlation Coefficient between ‘x’ and ‘y’ –

rxy = ± ruv

(According as b and d have the same sign, or opposite signs)

 

Spearman’s formulas for Correlation

Coefficient (r):

RANK CORRELATION

Formula: 1

r =

1 – [{6∑(d^2)} ÷ {(n^3) – n}]

 

Where,

d =  Differences of the ranks of the respective individual observations, and

n =  Number of individual observations

 

Formula: 2

r =

1 – [6{∑(d^2) + 1/12∑{(m^3) – m}} ÷ {(n^3) – n}]

 

Where,

d =  Differences of the ranks of the respective individual observations,

n =  Number of individual observations, and

m = Number of individual observations involved in a tie whether in the first or second series.

 

 

Important Properties of Correlation Coefficient

 

1.       The correlation coefficient ‘r’ is independent of the change of both origin and scale of the observations. Owing to this property, If ‘x’ and ‘y’ are two variables, and u = (x – a)/b and v = (y – c)/d (where, a, b, c and d are constants), the Correlation Coefficient between ‘x’ and ‘y’, rxy = ± ruv

(According as b and d have the same sign, or opposite signs)

2.       The correlation coefficient ‘r’ is a pure number and is independent of the units of measurement.

3.       The correlation coefficient ‘r’ lies between (− 1) and (+ 1); i.e. ‘r’ cannot exceed 1 numerically. That is, mathematically,

(− 1) ≤ ‘r’ ≤ (+ 1)

 

Regression

The word “regression” is used to denote estimation or prediction of the average value of one variable for a specified value of the other variable. The estimation is done by means of suitable equations, derived on the basis of available bivariate data. Such an equation is known as a Regression Equation.

In linear regression (or simple regression) the relationship between the variables is assumed to be linear.

 

Formulas of Regression

Regression Equation by the Method of Normal Equations

(i) For Regression Equation of X on Y i.e. for Regression Equation: X = a + bY,

     Normal Equations are:

∑X = na + b∑Y and

∑XY = a∑Y + b∑(Y^2)

(ii) For Regression Equation of Y on X i.e. for Regression Equation: Y = a + bX,

      Normal Equations are:

∑Y = na + b∑X and

∑XY = a∑X + b∑(X^2)

 

Regression Equation by the Method of Regression Coefficients:

(i) Regression Equation of X on Y:

X − Mean of X = bXY (Y − Mean of Y)

(ii) Regression Equation of Y on X:

Y − Mean of Y = bYX (X − Mean of X)

 

Here, bXY and bYX are called Regression Coefficients. These Regression Coefficients can be calculated by the following formulas:

 

bXY

1.

= [(∑XY)/n − {(∑X)/n}{(∑Y)/n}] ÷ [(∑Y^2)/n − {(∑Y)/n}^2]

2.

= [n∑XY − (∑X)(∑Y)] ÷ [n∑Y^2 − (∑Y)^2]

3.

= [Cov(X,Y)] ÷ [(σY)^2]

4.

= r [(σX)/(σY)]

5.

= buv

 

Where,

buv = [n∑uv − (∑u)(∑v)] ÷ [n∑v^2 − (∑v)^2]

u = (X − Assumed Mean of X-variable), and

v = (Y − Assumed Mean of Y-variable).

 

bYX

1.

= [(∑XY)/n − {(∑X)/n}{(∑Y)/n}] ÷ [(∑X^2)/n − {(∑X)/n}^2]

2.

= [n∑XY − (∑X)(∑Y)] ÷ [n∑X^2 − (∑X)^2]

3.

= [Cov(X,Y)] ÷ [(σX)^2]

4.

= r [(σY)/(σX)]

5.

= bvu

 

Where,

bvu = [n∑uv − (∑u)(∑v)] ÷ [n∑u^2 − (∑u)^2]

u = (X − Assumed Mean of X-variable), and

v = (Y − Assumed Mean of Y-variable).

 

Important Properties of Linear Regression

 

1.       The product of the two regression coefficients is equal to the square of correlation coefficients. Mathematically, (bYX) × (bXY) = r^2.

2.       r, bYX and bXY all have the same sign. If the correlation coefficient ‘r’ is zero, the regression coefficients bYX and bXY are also zero.

3.       The regression lines always intersect at the point (Mean of X, Mean of Y). The slopes of the regression line of Y on X and the regression line of X on Y are respectively bYX and 1/bXY.

4.       The angle between the two regression lines depends on the correlation coefficient ‘r’. When r = 0, the two lines are perpendicular to each other; when r = + 1, or r = − 1, they coincide. As ‘r’ increases numerically from 0 to 1, the angle between the regression lines diminishes from 900 to 00.

 

Other Important Formulas

1.

Coefficient of determination = r^2

2.

Coefficient of non-determination = 1 – r^2

3.

Coefficient of concurrent deviation,

rc = ± [± {(2c – m)/m}]^(1/2)

Where,

c = No. of concurrent deviations (i.e., No. of (+)ve signs in the product of deviation column.

m = Total number of deviations (i.e., 1 less than the number of pairs).

 

Important notes:

1. If (2c – m) > 0, both outside and inside the square root the sign will be (+)ve, and

2. If (2c – m) < 0, both outside and inside the square root the sign will be (−)ve.

 


Part B


Business Statistics

Correlation and Regression

Selected Problems and Solutions

 

Correlation Coefficient

Problem: 1

Obtain the correlation coefficient from the following:

x –

6

2

10

4

8

y –

9

11

5

8

7

 

Solution: 1 



Problem: 2

Calculate the coefficient of correlation for the ages of husband and wife as given below:

Age of husband

23

27

28

29

30

31

33

35

36

39

Age of wife

18

22

23

24

25

26

28

29

30

32

 

 Solution: 2

 


Problem: 3

From the following figures, calculate the coefficient of correlation between the income and the general level of prices:

Income (X)

360

420

500

550

600

640

680

720

750

General level of prices (Y)

100

104

115

160

180

290

300

320

330

 

Solution : 3



Problem: 4

The following table gives the index numbers of industrial production in a country and the number of registered unemployed persons in the same country during the eight consecutive years. Calculate the coefficient of correlation between the two variables:

Year

1954

1955

1956

1957

1958

1959

1960

1961

Index of industrial production

100

102

103

105

106

104

103

98

No. of registered unemployed persons (in ’000)

10.5

11,4

13.0

11.5

12.0

12.5

15.6

20.8

 

 Solution: 4

 


Problem: 5

Calculate the coefficient of correlation from the following data:

x

2.52

2.49

2.49

2.45

2.43

2.42

2.41

2.40

y

730

710

770

890

970

1020

970

1040

 

Solution: 5



Problem: 6

Determine the correlation coefficient between x and y:

x

5

7

9

11

13

15

y

1.7

2.4

2.8

3.4

3.7

4.4

 

Solution: 6



Problem: 7

Calculate the coefficient of correlation from the following data:

Export of raw cotton (Rs in crores)

42

44

58

55

89

98

66

Import of manufactured goods (Rs in crores)

56

49

53

58

65

76

58

 

Calculate also the standard error of the coefficient of correlation.

 

Solution: 7

 


Problem: 8

The following data give the hardness (x) and tensile strength (y) for some specimens of a material, in certain units. Find the correlation coefficient and calculate its probable error:

x

23.3

17.5

17.8

20.7

18.1

20.9

22.9

20.8

y

4.2

3.8

4.6

3.2

5.2

4.7

4.4

5.6

 

Solution: 8

 


Problem: 9

Marks of 10 students in Mathematics and Statistics are given below:

Mathematics (X)

32

38

48

43

40

22

41

69

35

64

Statistics (Y)

30

31

38

43

33

11

27

76

40

59

 

Calculate (i) correlation coefficient, and (ii) its standard error.

 

Solution: 9



Problem: 10

Find the coefficient of correlation from the following data:

X

65

63

67

64

68

62

70

66

Y

68

66

68

65

69

66

68

65

 

Solution: 10

 


Problem: 11

Calculate Pearson’s Coefficient of Correlation from the following data taking 44 and 26 as assumed means of X and Y respectively.

X

43

44

46

40

44

42

45

42

38

40

42

57

Y

29

31

19

18

19

27

27

29

41

30

26

10

 

Solution: 11

 


Problem: 12

In a contest two judges ranked eight candidates A, B, C, D, E, F, G and H in order of their preference, as shown in the following table. Find the rank correlation coefficient.

Candidates

A

B

C

D

E

F

G

H

First Judge

5

2

8

1

4

6

3

7

Second Judge

4

5

7

3

2

8

1

6

 

Solution: 12

 


Problem: 13

Compute the correlation coefficient of the following ranks of a group of students in two examinations.

Roll Nos.

1

2

3

4

5

6

7

8

9

10

Ranks in B.Com. Exam.

1

5

8

6

7

4

2

3

9

10

Ranks in M.Com. Exam.

2

1

5

7

6

3

4

8

10

9

 

Solution: 13

 


Problem: 14

Ten students obtained the following marks in Mathematics and Statistics. Calculate the rank correlation coefficient.

Roll Nos.

1

2

3

4

5

6

7

8

9

10

Marks in Maths.

78

36

98

25

75

82

90

62

65

39

Marks in Stats.

84

51

91

60

68

62

86

58

53

47

 

Solution: 14

 


Problem: 15

In the following table are recorded data showing the test scores made by 10 salesmen on an intelligence test and their weekly sales:

Salesmen

1

2

3

4

5

6

7

8

9

10

Test Scores

50

70

50

60

80

50

90

50

60

60

Sales (Rs ’000)

25

60

45

50

45

20

55

30

45

30

 

Calculate the rank correlation coefficient between intelligence and efficiency in salesmanship.

 

Solution: 15

 


Problem: 16

Eight students have obtained the following marks in Accountancy and Economics. Calculate the Rank Coefficient of Correlation.

Accountancy (X)

25

30

38

22

50

70

30

90

Economics (Y)

50

40

60

40

30

20

40

70

 

Solution: 16





Regression Equations

Problem: 17

Estimate (a) the sales of advertising expenditure of Rs 100 lakhs and (b) the advertisement expenditure for sales of Rs 47 crores from the data given below:

 

Sales (Rs in crores)

14

16

18

20

24

30

32

Adv. Exp. (Rs in lakhs)

52

62

65

70

76

80

78

 

Solution: 17

 


Problem: 18

Past 10 years’ data on rainfall and Yield of wheat in a certain village offered the following results:

 

Average wheat yield

25 Qtl.

Average rainfall

20 Cms.

Variance of wheat output

3 Qtl.

Variance of rainfall

5 Cms.

Correlation Coefficient

0.65

 

Find the most likely wheat output per acre when the rainfall is 35 Cms.

 

Solution: 18



Problem: 19

In trying to evaluate the effectiveness of its advertising campaign, a company compiled the following information:

 

Year

Advertisement Exp. (Rs ’000)

Sales (Rs lakhs)

1998

12

5

1999

15

5.6

2000

15

5.8

2001

23

7

2002

24

7.2

2003

38

8.8

2004

42

9.2

2005

48

9.5

 

Calculate the regression equation of sales on advertising expenditure. Estimate the probable sales when advertising expenditure is Rs 60,000.

 

Solution: 19



Problem: 20

Given the following bivariate data:

X

1

5

3

2

1

2

7

3

Y

6

1

0

0

1

2

1

5

 

Find the regression equations by taking deviations of items from the means of X and Y respectively.

 

Solution: 20

 


Problem: 21

Given the bivariate data:

X: 2, 6, 4, 3, 2, 2, 8, 4, and

Y: 7, 2, 1, 1, 2, 3, 2, 6.

 

(a)     Fit the regression line of Y on X and hence predict Y, if X = 20; and

(b)     Fit the regression line of X on Y and hence predict X, if Y = 5.

 

Solution: 21



Problem: 22

From the following data obtain the two regression equations:

Sales

91

97

108

121

67

124

51

73

111

57

Purchases

71

75

69

97

70

91

39

61

80

47

 

Solution: 22

 


Problem: 23

Obtain the equation of the line of regression of yield of rice (y) on water (x) from the data given in the following table:

Water in Inches (x)

12

18

24

30

36

42

48

Yield in Tons (y)

5.27

5.68

6.25

7.21

8.02

8.71

8.42

 

From the equation so obtained, estimate the most probable yield of rice for 40 inches of water.

 

Solution: 23

 


Problem: 24

Find the two lines of regression from the following data:

Age of husband

25

22

28

26

35

20

22

40

20

18

Age of wife

18

15

20

17

22

14

16

21

15

14

 

Hence, estimate (i) the age of husband when the age of wife is 19, and (ii) the age of wife when the age of husband is 30.

 

Solution: 24

 


Problem: 25

Marks obtained by 12 students in the college test (x) and the university test (y) are as follows:

x

41

45

50

68

47

77

90

100

80

100

40

43

y

60

63

60

48

85

56

53

91

74

98

65

43

 

What is your estimate of the marks a student could have obtained in the university test if he obtained 60 in the college test but was ill at the time of the university test?

 

Solution: 25



Problem: 26

Obtain the linear regression equation that you consider more relevant for the following set of paired observations:

Age

56

42

72

36

63

47

55

49

38

42

68

60

Blood Pressure

147

125

160

118

149

128

150

145

115

140

152

155

 

Also estimate the blood pressure of a person whose age is 45.


Solution: 26




No comments:

Post a Comment