Business Statistics
Correlation and Regression
(Formulas and Selected Problems and Solutions)
Part A:
Part B:
In measures of
central tendency and measures of dispersion we study characteristics of one
variable only, e.g. mean of the distribution of heights of the students of a
class, standard deviation of weight of the students etc. But, there may arise
many such situations in which we may have to study two variables
simultaneously. For example, the variables may be –
(i)
The
amount of rainfall and yield of a certain crop,
(ii)
The
height and weight of a group of children,
(iii)
Income
and expenditure of several families,
(iv)
Ages
of husband and wife,
(v)
Rise
/ fall of temperature and increase / decrease in sale of cold drinks,
(vi) General
income level of a country and percentage of literate population of the country,
etc.
Two variables
may be required to be studied simultaneously for the following two objectives:
1. To measure numerically the strength of association and nature of relationship between the two variables (This is the problem of correlation), and
2. To make estimates or predictions regarding the principal variable when the value of the other variable is known (This is the problem of regression).
Thus, in
short, correlation is concerned with the measurement of the ‘strength of
association’ between the variables; while regression is concerned with the
‘prediction’ of the most likely value of one variable when the value of the
other variable is known.
In simple
correlation (also called linear correlation) the strength of linear type of
relationship between the variables is considered. Similarly, in simple
regression (also called linear regression) the linear equation between the
variables is considered.
Correlation
The word
“correlation” is used to denote the degree
of association between variables. If two variables ‘x’ and ‘y’ are so
related that variations in the magnitude of one variable tend to be accompanied
by variations in the magnitude of the other variable, they are said to be correlated. If ‘y’ tends to increase as
‘x’ increases, the variables are said to be positively
correlated. If ‘y’ tends to decrease as ‘x’ increases, the variables are
said to be negatively correlated. If
the values of ‘y’ are not affected by changes in the values of ‘x’, the
variables are said to be uncorrelated.
The linear
correlation or simple correlation (i.e. the degree of association between two
variables) is measured by Correlation
Coefficient.
Formulas of Correlation
Pearson’s formulas for Correlation
Coefficient (r):
DIRECT METHOD
If
‘x’ and ‘y’ are two variables, the Correlation Coefficient between them, r – |
|||||||
1 |
r
= |
[cov(x,
y)] ÷ σxσy Where,
|
|||||
2 |
r
= |
[∑(x
– Mean of x)(y – Mean of y)] ÷ [∑{(x – Mean of x)^2}.∑{(y – Mean of
y)^2}]^(1/2) |
|||||
3 |
r
= |
[∑xy
– n(Mean of x)(Mean of y)] ÷ [{∑x^2 – n(Mean of x)^2}.{∑y^2 – n(Mean of
y)^2}]^(1/2) |
|||||
4 |
r
= |
[n∑xy
– (∑x).(∑y)] ÷ [{n∑x^2 – (∑x)^2}.{n∑y^2 – (∑y)^2}]^(1/2) |
SHORT-CUT METHOD
If
‘x’ and ‘y’ are two variables, and X = x – c and Y = y – d (where, c and d
are constants), the Correlation Coefficient between ‘x’ and ‘y’ – rxy = rXY |
STEP DEVIATION METHOD
If
‘x’ and ‘y’ are two variables, and u = (x – a)/b and v = (y – c)/d (where, a,
b, c and d are constants), the Correlation Coefficient between ‘x’ and ‘y’ – rxy = ± ruv (According
as b and d have the same sign, or opposite signs) |
Spearman’s formulas for Correlation
Coefficient (r):
RANK CORRELATION
Formula: 1 |
|
r
= |
1
– [{6∑(d^2)} ÷ {(n^3) – n}] |
|
Where, d = Differences of the ranks of the respective
individual observations, and n = Number of individual observations |
Formula: 2 |
|
r
= |
1
– [6{∑(d^2) + 1/12∑{(m^3)
– m}} ÷ {(n^3) – n}] |
|
Where, d = Differences of the ranks of the respective
individual observations, n = Number of individual observations, and m = Number of individual
observations involved in a tie whether in the first or second series. |
Important Properties of Correlation
Coefficient
1.
The
correlation coefficient ‘r’ is independent of the change of both origin and
scale of the observations. Owing to this property, If ‘x’ and ‘y’ are two
variables, and u = (x – a)/b and v = (y – c)/d (where, a, b, c and d are
constants), the Correlation Coefficient between ‘x’ and ‘y’, rxy
= ± ruv
(According as b and d have the same sign, or opposite signs)
2.
The
correlation coefficient ‘r’ is a pure number and is independent of the units of
measurement.
3.
The
correlation coefficient ‘r’ lies between (− 1) and (+ 1); i.e. ‘r’ cannot
exceed 1 numerically. That is, mathematically,
(− 1) ≤ ‘r’ ≤ (+ 1)
Regression
The word “regression” is used to denote estimation or
prediction of the average value of one variable for a specified value of the
other variable. The estimation is done by means of suitable equations, derived
on the basis of available bivariate data. Such an equation is known as a Regression Equation.
In linear regression (or simple regression) the
relationship between the variables is assumed to be linear.
Formulas of Regression
Regression Equation by the Method of
Normal Equations
(i) For
Regression Equation of X on Y i.e. for Regression Equation: X = a + bY, Normal Equations are: |
∑X
= na + b∑Y and |
∑XY
= a∑Y + b∑(Y^2) |
(ii) For
Regression Equation of Y on X i.e. for Regression Equation: Y = a + bX, Normal Equations are: |
∑Y
= na + b∑X and |
∑XY
= a∑X + b∑(X^2) |
Regression Equation by the Method of
Regression Coefficients:
(i) Regression
Equation of X on Y: |
X
− Mean of X = bXY (Y − Mean of Y) |
(ii) Regression
Equation of Y on X: |
Y
− Mean of Y = bYX (X − Mean of X) |
Here, bXY
and bYX are called Regression Coefficients. These Regression
Coefficients can be calculated by the following formulas:
bXY
1. |
= [(∑XY)/n − {(∑X)/n}{(∑Y)/n}] ÷ [(∑Y^2)/n −
{(∑Y)/n}^2] |
2. |
=
[n∑XY − (∑X)(∑Y)] ÷ [n∑Y^2 − (∑Y)^2] |
3. |
=
[Cov(X,Y)] ÷ [(σY)^2] |
4. |
=
r [(σX)/(σY)] |
5. |
= buv |
|
Where, buv = [n∑uv −
(∑u)(∑v)] ÷ [n∑v^2 − (∑v)^2] u = (X − Assumed Mean of X-variable), and v = (Y − Assumed Mean of Y-variable). |
bYX
1. |
= [(∑XY)/n − {(∑X)/n}{(∑Y)/n}] ÷ [(∑X^2)/n − {(∑X)/n}^2] |
2. |
=
[n∑XY − (∑X)(∑Y)] ÷ [n∑X^2 − (∑X)^2] |
3. |
=
[Cov(X,Y)] ÷ [(σX)^2] |
4. |
=
r [(σY)/(σX)] |
5. |
= bvu |
|
Where, bvu = [n∑uv −
(∑u)(∑v)] ÷ [n∑u^2 − (∑u)^2] u = (X − Assumed Mean of X-variable), and v = (Y − Assumed Mean of Y-variable). |
Important Properties of Linear
Regression
1.
The
product of the two regression coefficients is equal to the square of
correlation coefficients. Mathematically, (bYX) × (bXY) =
r^2.
2.
r,
bYX and bXY all have the same sign. If the correlation
coefficient ‘r’ is zero, the regression coefficients bYX and bXY
are also zero.
3.
The
regression lines always intersect at the point (Mean of X, Mean of Y). The
slopes of the regression line of Y on X and the regression line of X on Y are
respectively bYX and 1/bXY.
4.
The
angle between the two regression lines depends on the correlation coefficient
‘r’. When r = 0, the two lines are perpendicular to each other; when r = + 1,
or r = − 1, they coincide. As ‘r’ increases numerically from 0 to 1, the angle
between the regression lines diminishes from 900 to 00.
Other Important Formulas
1. |
Coefficient
of determination = r^2 |
2. |
Coefficient
of non-determination = 1 – r^2 |
3. |
Coefficient
of concurrent deviation, rc = ± [± {(2c – m)/m}]^(1/2) Where, c
= No. of concurrent deviations (i.e., No. of (+)ve signs in the product of
deviation column. m
= Total number of deviations (i.e., 1 less than the number of pairs). Important notes: 1.
If (2c – m) > 0, both outside and inside the square root the sign will be
(+)ve, and 2.
If (2c – m) < 0, both outside and inside the square root the sign will be
(−)ve. |
Business Statistics
Correlation and Regression
Selected Problems and Solutions
Correlation Coefficient
Problem: 1
Obtain
the correlation coefficient from the following:
x – |
6 |
2 |
10 |
4 |
8 |
y – |
9 |
11 |
5 |
8 |
7 |
Solution: 1
Problem: 2
Calculate
the coefficient of correlation for the ages of husband and wife as given below:
Age of husband |
23 |
27 |
28 |
29 |
30 |
31 |
33 |
35 |
36 |
39 |
Age of wife |
18 |
22 |
23 |
24 |
25 |
26 |
28 |
29 |
30 |
32 |
Solution: 2
Problem: 3
From
the following figures, calculate the coefficient of correlation between the
income and the general level of prices:
Income (X) |
360 |
420 |
500 |
550 |
600 |
640 |
680 |
720 |
750 |
General level of prices (Y) |
100 |
104 |
115 |
160 |
180 |
290 |
300 |
320 |
330 |
Solution : 3
Problem: 4
The
following table gives the index numbers of industrial production in a country
and the number of registered unemployed persons in the same country during the
eight consecutive years. Calculate the coefficient of correlation between the
two variables:
Year |
1954 |
1955 |
1956 |
1957 |
1958 |
1959 |
1960 |
1961 |
Index of industrial production |
100 |
102 |
103 |
105 |
106 |
104 |
103 |
98 |
No. of registered unemployed persons (in ’000) |
10.5 |
11,4 |
13.0 |
11.5 |
12.0 |
12.5 |
15.6 |
20.8 |
Solution: 4
Problem: 5
Calculate
the coefficient of correlation from the following data:
x |
2.52 |
2.49 |
2.49 |
2.45 |
2.43 |
2.42 |
2.41 |
2.40 |
y |
730 |
710 |
770 |
890 |
970 |
1020 |
970 |
1040 |
Solution: 5
Problem: 6
Determine
the correlation coefficient between x and y:
x |
5 |
7 |
9 |
11 |
13 |
15 |
y |
1.7 |
2.4 |
2.8 |
3.4 |
3.7 |
4.4 |
Solution: 6
Problem: 7
Calculate
the coefficient of correlation from the following data:
Export of raw cotton (Rs in crores) |
42 |
44 |
58 |
55 |
89 |
98 |
66 |
Import of manufactured goods (Rs in crores) |
56 |
49 |
53 |
58 |
65 |
76 |
58 |
Calculate also the standard
error of the coefficient of correlation.
Solution: 7
Problem: 8
The
following data give the hardness (x) and tensile strength (y) for some
specimens of a material, in certain units. Find the correlation coefficient and
calculate its probable error:
x |
23.3 |
17.5 |
17.8 |
20.7 |
18.1 |
20.9 |
22.9 |
20.8 |
y |
4.2 |
3.8 |
4.6 |
3.2 |
5.2 |
4.7 |
4.4 |
5.6 |
Solution: 8
Problem: 9
Marks
of 10 students in Mathematics and Statistics are given below:
Mathematics (X) |
32 |
38 |
48 |
43 |
40 |
22 |
41 |
69 |
35 |
64 |
Statistics (Y) |
30 |
31 |
38 |
43 |
33 |
11 |
27 |
76 |
40 |
59 |
Calculate (i) correlation
coefficient, and (ii) its standard error.
Solution: 9
Problem: 10
Find the coefficient of correlation from the
following data:
X |
65 |
63 |
67 |
64 |
68 |
62 |
70 |
66 |
Y |
68 |
66 |
68 |
65 |
69 |
66 |
68 |
65 |
Solution: 10
Problem: 11
Calculate
Pearson’s Coefficient of Correlation from the following data taking 44 and 26
as assumed means of X and Y respectively.
X |
43 |
44 |
46 |
40 |
44 |
42 |
45 |
42 |
38 |
40 |
42 |
57 |
Y |
29 |
31 |
19 |
18 |
19 |
27 |
27 |
29 |
41 |
30 |
26 |
10 |
Solution: 11
Problem: 12
In a contest two
judges ranked eight candidates A, B, C, D, E, F, G and H in order of their
preference, as shown in the following table. Find the rank correlation
coefficient.
Candidates |
A |
B |
C |
D |
E |
F |
G |
H |
First
Judge |
5 |
2 |
8 |
1 |
4 |
6 |
3 |
7 |
Second
Judge |
4 |
5 |
7 |
3 |
2 |
8 |
1 |
6 |
Solution: 12
Problem: 13
Compute the
correlation coefficient of the following ranks of a group of students in two
examinations.
Roll
Nos. |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
10 |
Ranks
in B.Com. Exam. |
1 |
5 |
8 |
6 |
7 |
4 |
2 |
3 |
9 |
10 |
Ranks
in M.Com. Exam. |
2 |
1 |
5 |
7 |
6 |
3 |
4 |
8 |
10 |
9 |
Solution: 13
Problem: 14
Ten students
obtained the following marks in Mathematics and Statistics. Calculate the rank
correlation coefficient.
Roll
Nos. |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
10 |
Marks
in Maths. |
78 |
36 |
98 |
25 |
75 |
82 |
90 |
62 |
65 |
39 |
Marks
in Stats. |
84 |
51 |
91 |
60 |
68 |
62 |
86 |
58 |
53 |
47 |
Solution: 14
Problem: 15
In
the following table are recorded data showing the test scores made by 10
salesmen on an intelligence test and their weekly sales:
Salesmen |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
10 |
Test
Scores |
50 |
70 |
50 |
60 |
80 |
50 |
90 |
50 |
60 |
60 |
Sales
(Rs ’000) |
25 |
60 |
45 |
50 |
45 |
20 |
55 |
30 |
45 |
30 |
Calculate
the rank correlation coefficient between intelligence and efficiency in
salesmanship.
Solution: 15
Problem: 16
Eight students
have obtained the following marks in Accountancy and Economics. Calculate the
Rank Coefficient of Correlation.
Accountancy
(X) |
25 |
30 |
38 |
22 |
50 |
70 |
30 |
90 |
Economics
(Y) |
50 |
40 |
60 |
40 |
30 |
20 |
40 |
70 |
Solution: 16
Regression Equations
Problem: 17
Estimate (a) the
sales of advertising expenditure of Rs 100 lakhs and (b) the advertisement
expenditure for sales of Rs 47 crores from the data given below:
Sales (Rs in
crores) |
14 |
16 |
18 |
20 |
24 |
30 |
32 |
Adv. Exp. (Rs in
lakhs) |
52 |
62 |
65 |
70 |
76 |
80 |
78 |
Solution: 17
Problem: 18
Past 10 years’
data on rainfall and Yield of wheat in a certain village offered the following
results:
Average
wheat yield |
25
Qtl. |
Average
rainfall |
20
Cms. |
Variance
of wheat output |
3
Qtl. |
Variance
of rainfall |
5
Cms. |
Correlation
Coefficient |
0.65 |
Find the most
likely wheat output per acre when the rainfall is 35 Cms.
Solution: 18
Problem: 19
In trying to
evaluate the effectiveness of its advertising campaign, a company compiled the
following information:
Year |
Advertisement
Exp. (Rs ’000) |
Sales
(Rs lakhs) |
1998 |
12 |
5 |
1999 |
15 |
5.6 |
2000 |
15 |
5.8 |
2001 |
23 |
7 |
2002 |
24 |
7.2 |
2003 |
38 |
8.8 |
2004 |
42 |
9.2 |
2005 |
48 |
9.5 |
Calculate the
regression equation of sales on advertising expenditure. Estimate the probable
sales when advertising expenditure is Rs 60,000.
Solution: 19
Problem: 20
Given the
following bivariate data:
X |
1 |
5 |
3 |
2 |
1 |
2 |
7 |
3 |
Y |
6 |
1 |
0 |
0 |
1 |
2 |
1 |
5 |
Find the
regression equations by taking deviations of items from the means of X and Y
respectively.
Solution: 20
Problem: 21
Given the
bivariate data:
X: 2, 6, 4, 3, 2, 2, 8, 4, and
Y: 7, 2, 1, 1, 2, 3, 2, 6.
(a)
Fit
the regression line of Y on X and hence predict Y, if X = 20; and
(b)
Fit
the regression line of X on Y and hence predict X, if Y = 5.
Solution: 21
Problem: 22
From
the following data obtain the two regression equations:
Sales |
91 |
97 |
108 |
121 |
67 |
124 |
51 |
73 |
111 |
57 |
Purchases |
71 |
75 |
69 |
97 |
70 |
91 |
39 |
61 |
80 |
47 |
Solution: 22
Problem: 23
Obtain the
equation of the line of regression of yield of rice (y) on water (x) from the
data given in the following table:
Water
in Inches (x) |
12 |
18 |
24 |
30 |
36 |
42 |
48 |
Yield
in Tons (y) |
5.27 |
5.68 |
6.25 |
7.21 |
8.02 |
8.71 |
8.42 |
From the
equation so obtained, estimate the most probable yield of rice for 40 inches of
water.
Solution: 23
Problem: 24
Find the two
lines of regression from the following data:
Age of husband |
25 |
22 |
28 |
26 |
35 |
20 |
22 |
40 |
20 |
18 |
Age of wife |
18 |
15 |
20 |
17 |
22 |
14 |
16 |
21 |
15 |
14 |
Hence,
estimate (i) the age of husband when the age of wife is 19, and (ii) the age of
wife when the age of husband is 30.
Solution: 24
Problem: 25
Marks
obtained by 12 students in the college test (x) and the university test (y) are
as follows:
x |
41 |
45 |
50 |
68 |
47 |
77 |
90 |
100 |
80 |
100 |
40 |
43 |
y |
60 |
63 |
60 |
48 |
85 |
56 |
53 |
91 |
74 |
98 |
65 |
43 |
What
is your estimate of the marks a student could have obtained in the university
test if he obtained 60 in the college test but was ill at the time of the
university test?
Solution: 25
Problem: 26
Obtain
the linear regression equation that you consider more relevant for the
following set of paired observations:
Age |
56 |
42 |
72 |
36 |
63 |
47 |
55 |
49 |
38 |
42 |
68 |
60 |
Blood
Pressure |
147 |
125 |
160 |
118 |
149 |
128 |
150 |
145 |
115 |
140 |
152 |
155 |
Also
estimate the blood pressure of a person whose age is 45.
Solution: 26
No comments:
Post a Comment