cmabdtheblogger: Business Statistics - Correlation and Regression (Formulas and Selected Problems and Solutions)

Business Statistics

Correlation and Regression

(Formulas and Selected Problems and Solutions)

Part A:

In Part A you will find -

1. Discussion about what is 'correlation' and 'correlation coefficient';

2. Discussion about what is 'regression' and 'regression equation';

3. Formulas for calculating 'correlation coefficient' including 'rank correlation coefficient';

4. Important properties of correlation coefficient;

5. Formulas and Methods for finding the regression equations; and

6. Important properties of linear regression.

Part B:

In Part B you will find twenty-six selected problems with solutions.

Part A

In measures of central tendency and measures of dispersion we study characteristics of one variable only, e.g. mean of the distribution of heights of the students of a class, standard deviation of weight of the students etc. But, there may arise many such situations in which we may have to study two variables simultaneously. For example, the variables may be –

(i) The amount of rainfall and yield of a certain crop,

(ii) The height and weight of a group of children,

(iii) Income and expenditure of several families,

(iv) Ages of husband and wife,

(v) Rise / fall of temperature and increase / decrease in sale of cold drinks,

(vi) General income level of a country and percentage of literate population of the country, etc.

Two variables may be required to be studied simultaneously for the following two objectives:

1. To measure numerically the strength of association and nature of relationship between the two variables (This is the problem of correlation), and

2. To make estimates or predictions regarding the principal variable when the value of the other variable is known (This is the problem of regression).

Thus, in short, correlation is concerned with the measurement of the ‘strength of association’ between the variables; while regression is concerned with the ‘prediction’ of the most likely value of one variable when the value of the other variable is known.

In simple correlation (also called linear correlation) the strength of linear type of relationship between the variables is considered. Similarly, in simple regression (also called linear regression) the linear equation between the variables is considered.

Correlation

The word “correlation” is used to denote the degree of association between variables. If two variables ‘x’ and ‘y’ are so related that variations in the magnitude of one variable tend to be accompanied by variations in the magnitude of the other variable, they are said to be correlated. If ‘y’ tends to increase as ‘x’ increases, the variables are said to be positively correlated. If ‘y’ tends to decrease as ‘x’ increases, the variables are said to be negatively correlated. If the values of ‘y’ are not affected by changes in the values of ‘x’, the variables are said to be uncorrelated.

The linear correlation or simple correlation (i.e. the degree of association between two variables) is measured by Correlation Coefficient.

Formulas of Correlation

Pearson’s formulas for Correlation Coefficient (r):

DIRECT METHOD

If ‘x’ and ‘y’ are two variables, the Correlation Coefficient between them, r –

r =

[cov(x, y)] ÷ σ_xσ_y

Where,

(i) σ_x = Standard Deviation of x-series

(ii) σ_y = Standard Deviation of y-series

(iii) cov(x, y)

= 1/n[∑(x – Mean of x)(y – Mean of y)]

= [(∑xy)/n] – [{(∑x)/n}.{(∑y)/n}]

r =

[∑(x – Mean of x)(y – Mean of y)] ÷ [∑{(x – Mean of x)^2}.∑{(y – Mean of y)^2}]^(1/2)

r =

[∑xy – n(Mean of x)(Mean of y)] ÷ [{∑x^2 – n(Mean of x)^2}.{∑y^2 – n(Mean of y)^2}]^(1/2)

r =

[n∑xy – (∑x).(∑y)] ÷ [{n∑x^2 – (∑x)^2}.{n∑y^2 – (∑y)^2}]^(1/2)

SHORT-CUT METHOD

If ‘x’ and ‘y’ are two variables, and X = x – c and Y = y – d (where, c and d are constants), the Correlation Coefficient between ‘x’ and ‘y’ –

r_xy = r_XY

STEP DEVIATION METHOD

If ‘x’ and ‘y’ are two variables, and u = (x – a)/b and v = (y – c)/d (where, a, b, c and d are constants), the Correlation Coefficient between ‘x’ and ‘y’ –

r_xy = ± r_uv

(According as b and d have the same sign, or opposite signs)

Spearman’s formulas for Correlation

Coefficient (r):

RANK CORRELATION

Formula: 1

r =

1 – [{6∑(d^2)} ÷ {(n^3) – n}]

Where,

d = Differences of the ranks of the respective individual observations, and

n = Number of individual observations

Formula: 2

r =

1 – [6{∑(d^2) + ¹/₁₂∑{(m^3) – m}} ÷ {(n^3) – n}]

Where,

d = Differences of the ranks of the respective individual observations,

n = Number of individual observations, and

m = Number of individual observations involved in a tie whether in the first or second series.

Important Properties of Correlation Coefficient

1. The correlation coefficient ‘r’ is independent of the change of both origin and scale of the observations. Owing to this property, If ‘x’ and ‘y’ are two variables, and u = (x – a)/b and v = (y – c)/d (where, a, b, c and d are constants), the Correlation Coefficient between ‘x’ and ‘y’, r_xy = ± r_uv

(According as b and d have the same sign, or opposite signs)

2. The correlation coefficient ‘r’ is a pure number and is independent of the units of measurement.

3. The correlation coefficient ‘r’ lies between (− 1) and (+ 1); i.e. ‘r’ cannot exceed 1 numerically. That is, mathematically,

(− 1) ≤ ‘r’ ≤ (+ 1)

Regression

The word “regression” is used to denote estimation or prediction of the average value of one variable for a specified value of the other variable. The estimation is done by means of suitable equations, derived on the basis of available bivariate data. Such an equation is known as a Regression Equation.

In linear regression (or simple regression) the relationship between the variables is assumed to be linear.

Formulas of Regression

Regression Equation by the Method of Normal Equations

(i) For Regression Equation of X on Y i.e. for Regression Equation: X = a + bY,

Normal Equations are:

∑X = na + b∑Y and

∑XY = a∑Y + b∑(Y^2)

(ii) For Regression Equation of Y on X i.e. for Regression Equation: Y = a + bX,

Normal Equations are:

∑Y = na + b∑X and

∑XY = a∑X + b∑(X^2)

Regression Equation by the Method of Regression Coefficients:

(i) Regression Equation of X on Y:

X − Mean of X = b_XY (Y − Mean of Y)

(ii) Regression Equation of Y on X:

Y − Mean of Y = b_YX (X − Mean of X)

Here, b_XY and b_YX are called Regression Coefficients. These Regression Coefficients can be calculated by the following formulas:

b_XY

1.	= [(∑XY)/n − {(∑X)/n}{(∑Y)/n}] ÷ [(∑Y^2)/n − {(∑Y)/n}^2]
2.	= [n∑XY − (∑X)(∑Y)] ÷ [n∑Y^2 − (∑Y)^2]
3.	= [Cov(X,Y)] ÷ [(σ_Y)^2]
4.	= r [(σ_X)/(σ_Y)]
5.	= b_uv
	Where, b_uv = [n∑uv − (∑u)(∑v)] ÷ [n∑v^2 − (∑v)^2] u = (X − Assumed Mean of X-variable), and v = (Y − Assumed Mean of Y-variable).

b_YX

1.	= [(∑XY)/n − {(∑X)/n}{(∑Y)/n}] ÷ [(∑X^2)/n − {(∑X)/n}^2]
2.	= [n∑XY − (∑X)(∑Y)] ÷ [n∑X^2 − (∑X)^2]
3.	= [Cov(X,Y)] ÷ [(σ_X)^2]
4.	= r [(σ_Y)/(σ_X)]
5.	= b_vu
	Where, b_vu = [n∑uv − (∑u)(∑v)] ÷ [n∑u^2 − (∑u)^2] u = (X − Assumed Mean of X-variable), and v = (Y − Assumed Mean of Y-variable).

Important Properties of Linear Regression

1. The product of the two regression coefficients is equal to the square of correlation coefficients. Mathematically, (b_YX) × (b_XY) = r^2.

2. r, b_YX and b_XY all have the same sign. If the correlation coefficient ‘r’ is zero, the regression coefficients b_YX and b_XY are also zero.

3. The regression lines always intersect at the point (Mean of X, Mean of Y). The slopes of the regression line of Y on X and the regression line of X on Y are respectively b_YX and 1/b_XY.

4. The angle between the two regression lines depends on the correlation coefficient ‘r’. When r = 0, the two lines are perpendicular to each other; when r = + 1, or r = − 1, they coincide. As ‘r’ increases numerically from 0 to 1, the angle between the regression lines diminishes from 90⁰ to 0⁰.

Other Important Formulas

Coefficient of determination = r^2

Coefficient of non-determination = 1 – r^2

Coefficient of concurrent deviation,

r_c = ± [± {(2c – m)/m}]^(1/2)

Where,

c = No. of concurrent deviations (i.e., No. of (+)ve signs in the product of deviation column.

m = Total number of deviations (i.e., 1 less than the number of pairs).

Important notes:

1. If (2c – m) > 0, both outside and inside the square root the sign will be (+)ve, and

2. If (2c – m) < 0, both outside and inside the square root the sign will be (−)ve.

Part B

Business Statistics

Correlation and Regression

Selected Problems and Solutions

Correlation Coefficient

Problem: 1

Obtain the correlation coefficient from the following:

x –	6	2	10	4	8
y –	9	11	5	8	7

Solution: 1

Problem: 2

Calculate the coefficient of correlation for the ages of husband and wife as given below:

Age of husband	23	27	28	29	30	31	33	35	36	39
Age of wife	18	22	23	24	25	26	28	29	30	32

Solution: 2

Problem: 3

From the following figures, calculate the coefficient of correlation between the income and the general level of prices:

Income (X)	360	420	500	550	600	640	680	720	750
General level of prices (Y)	100	104	115	160	180	290	300	320	330

Solution : 3

Problem: 4

The following table gives the index numbers of industrial production in a country and the number of registered unemployed persons in the same country during the eight consecutive years. Calculate the coefficient of correlation between the two variables:

Year	1954	1955	1956	1957	1958	1959	1960	1961
Index of industrial production	100	102	103	105	106	104	103	98
No. of registered unemployed persons (in ’000)	10.5	11,4	13.0	11.5	12.0	12.5	15.6	20.8

Solution: 4

Problem: 5

Calculate the coefficient of correlation from the following data:

x	2.52	2.49	2.49	2.45	2.43	2.42	2.41	2.40
y	730	710	770	890	970	1020	970	1040

Solution: 5

Problem: 6

Determine the correlation coefficient between x and y:

x	5	7	9	11	13	15
y	1.7	2.4	2.8	3.4	3.7	4.4

Solution: 6

Problem: 7

Calculate the coefficient of correlation from the following data:

Export of raw cotton (Rs in crores)	42	44	58	55	89	98	66
Import of manufactured goods (Rs in crores)	56	49	53	58	65	76	58

Calculate also the standard error of the coefficient of correlation.

Solution: 7

Problem: 8

The following data give the hardness (x) and tensile strength (y) for some specimens of a material, in certain units. Find the correlation coefficient and calculate its probable error:

x	23.3	17.5	17.8	20.7	18.1	20.9	22.9	20.8
y	4.2	3.8	4.6	3.2	5.2	4.7	4.4	5.6

Solution: 8

Problem: 9

Marks of 10 students in Mathematics and Statistics are given below:

Mathematics (X)	32	38	48	43	40	22	41	69	35	64
Statistics (Y)	30	31	38	43	33	11	27	76	40	59

Calculate (i) correlation coefficient, and (ii) its standard error.

Solution: 9

Problem: 10

Find the coefficient of correlation from the following data:

X	65	63	67	64	68	62	70	66
Y	68	66	68	65	69	66	68	65

Solution: 10

Problem: 11

Calculate Pearson’s Coefficient of Correlation from the following data taking 44 and 26 as assumed means of X and Y respectively.

X	43	44	46	40	44	42	45	42	38	40	42	57
Y	29	31	19	18	19	27	27	29	41	30	26	10

Solution: 11

Problem: 12

In a contest two judges ranked eight candidates A, B, C, D, E, F, G and H in order of their preference, as shown in the following table. Find the rank correlation coefficient.

Candidates	A	B	C	D	E	F	G	H
First Judge	5	2	8	1	4	6	3	7
Second Judge	4	5	7	3	2	8	1	6

Solution: 12

Problem: 13

Compute the correlation coefficient of the following ranks of a group of students in two examinations.

Roll Nos.	1	2	3	4	5	6	7	8	9	10
Ranks in B.Com. Exam.	1	5	8	6	7	4	2	3	9	10
Ranks in M.Com. Exam.	2	1	5	7	6	3	4	8	10	9

Solution: 13

Problem: 14

Ten students obtained the following marks in Mathematics and Statistics. Calculate the rank correlation coefficient.

Roll Nos.	1	2	3	4	5	6	7	8	9	10
Marks in Maths.	78	36	98	25	75	82	90	62	65	39
Marks in Stats.	84	51	91	60	68	62	86	58	53	47

Solution: 14

Problem: 15

In the following table are recorded data showing the test scores made by 10 salesmen on an intelligence test and their weekly sales:

Salesmen	1	2	3	4	5	6	7	8	9	10
Test Scores	50	70	50	60	80	50	90	50	60	60
Sales (Rs ’000)	25	60	45	50	45	20	55	30	45	30

Calculate the rank correlation coefficient between intelligence and efficiency in salesmanship.

Solution: 15

Problem: 16

Eight students have obtained the following marks in Accountancy and Economics. Calculate the Rank Coefficient of Correlation.

Accountancy (X)	25	30	38	22	50	70	30	90
Economics (Y)	50	40	60	40	30	20	40	70

Solution: 16

Regression Equations

Problem: 17

Estimate (a) the sales of advertising expenditure of Rs 100 lakhs and (b) the advertisement expenditure for sales of Rs 47 crores from the data given below:

Sales (Rs in crores)	14	16	18	20	24	30	32
Adv. Exp. (Rs in lakhs)	52	62	65	70	76	80	78

Solution: 17

Problem: 18

Past 10 years’ data on rainfall and Yield of wheat in a certain village offered the following results:

Average wheat yield	25 Qtl.
Average rainfall	20 Cms.
Variance of wheat output	3 Qtl.
Variance of rainfall	5 Cms.
Correlation Coefficient	0.65

Find the most likely wheat output per acre when the rainfall is 35 Cms.

Solution: 18

Problem: 19

In trying to evaluate the effectiveness of its advertising campaign, a company compiled the following information:

Year	Advertisement Exp. (Rs ’000)	Sales (Rs lakhs)
1998	12	5
1999	15	5.6
2000	15	5.8
2001	23	7
2002	24	7.2
2003	38	8.8
2004	42	9.2
2005	48	9.5

Calculate the regression equation of sales on advertising expenditure. Estimate the probable sales when advertising expenditure is Rs 60,000.

Solution: 19

Problem: 20

Given the following bivariate data:

X	1	5	3	2	1	2	7	3
Y	6	1	0	0	1	2	1	5

Find the regression equations by taking deviations of items from the means of X and Y respectively.

Solution: 20

Problem: 21

Given the bivariate data:

X: 2, 6, 4, 3, 2, 2, 8, 4, and

Y: 7, 2, 1, 1, 2, 3, 2, 6.

(a) Fit the regression line of Y on X and hence predict Y, if X = 20; and

(b) Fit the regression line of X on Y and hence predict X, if Y = 5.

Solution: 21

Problem: 22

From the following data obtain the two regression equations:

Sales	91	97	108	121	67	124	51	73	111	57
Purchases	71	75	69	97	70	91	39	61	80	47

Solution: 22

Problem: 23

Obtain the equation of the line of regression of yield of rice (y) on water (x) from the data given in the following table:

Water in Inches (x)	12	18	24	30	36	42	48
Yield in Tons (y)	5.27	5.68	6.25	7.21	8.02	8.71	8.42

From the equation so obtained, estimate the most probable yield of rice for 40 inches of water.

Solution: 23

Problem: 24

Find the two lines of regression from the following data:

Age of husband	25	22	28	26	35	20	22	40	20	18
Age of wife	18	15	20	17	22	14	16	21	15	14

Hence, estimate (i) the age of husband when the age of wife is 19, and (ii) the age of wife when the age of husband is 30.

Solution: 24

Problem: 25

Marks obtained by 12 students in the college test (x) and the university test (y) are as follows:

x	41	45	50	68	47	77	90	100	80	100	40	43
y	60	63	60	48	85	56	53	91	74	98	65	43

What is your estimate of the marks a student could have obtained in the university test if he obtained 60 in the college test but was ill at the time of the university test?

Solution: 25

Problem: 26

Obtain the linear regression equation that you consider more relevant for the following set of paired observations:

Age	56	42	72	36	63	47	55	49	38	42	68	60
Blood Pressure	147	125	160	118	149	128	150	145	115	140	152	155

Also estimate the blood pressure of a person whose age is 45.

Solution: 26

Menu Bar

Monday, May 06, 2024

Business Statistics - Correlation and Regression (Formulas and Selected Problems and Solutions)

No comments:

Post a Comment