Analyzing Cross-sectional Data and Correlation and Regression

Part one

Analyzing cross sectional data

This report describes the results of a research done on our members regarding the quality of services we offer on both full members and weekend members; it was aimed at exposing the level of our customer satisfaction regarding our services and equipment use. There were 100 participants and this number included 55 full members and 45 weekend members, 47 of them were male and 53 were female.

Full membership

55

Weekend only membership

1/40

Cross-sectional Data and Correlation and Regression

45

Male

1

1

47

Female

2

2

53

2/40

Cross-sectional Data and Correlation and Regression

The rating of the quality of our centre was based on the quality of instructors, quality of equipments used, and range of facilities and the cost of membership. All this ratings indicated the level of customer satisfaction regarding our services.

The rating according to our instructors indicated that out of 100 hundred participants 12 rated the services as bad, 35 of them rated the services as neither bad or good, 39 rated the services as good and only 14 of them rated our services as very good, the majority therefore rated our services good. This can be graphically represented as follows

quality of  instructors

Frequency

0

12

3/40

Cross-sectional Data and Correlation and Regression

35

Good

39

Very Good

14

As graphically shown the mode is good, the graph is negatively skewed or skewed to the left meaning that the majority of the outcomes or observations are on the left of the graph, another notable observation is that none of the participants rated our instructors as very bad.

The mean of this observation was 3.55 and the standard error was 0.88048 and therefore the variance is 0.29673, therefore our confidence interval at 95% can be calculated as follows,

P(X – T Sx ≤ µ≤ X + T Sx) = 95%

Where X is the sample mean, T is the value of 5% on the T table and Sx is the standard error of the mean. The T value from the t table is 0.172572

4/40

Cross-sectional Data and Correlation and Regression

P(3.55– 0.172572 (0.88048) ≤ µ≤ 3.55+ 0.172572 (0.88048) = 95%

Therefore we get the confidence interval as

P ( (-3.398) ≤ µ≤ (3.702) )= 95%

Our services were also rated through the quality of equipments used, out of 100 participants 6 of them rated the quality of equipments as very bad, 25 of them rated the equipments as bad, 33 of them rated the equipments as neither bad nor good, 27 of them rated the equipments as good and only 9 rated the equipments as very good.

Therefore the mode or the majority of the participants did not rate our services as bad or good regarding the quality of equipments, the mean of this observation was 3.08 and the standard error was 0.106059, the median and the mode were both 3, the results can be graphically represented as follows:

quality of any equipment  used

Frequency

6

5/40

Cross-sectional Data and Correlation and Regression

25

33

Good

27

Very Good

9

0

6/40

Cross-sectional Data and Correlation and Regression

We can construct a 95% confidence interval as follows P(X – T Sx ≤ µ≤ X + T Sx) = 95%

P ((3.08 – 0.172572 (0.106059)) ≤ µ≤ (3.08 + 0.172572(0.106059)) = 95%

P ((3.0617) ≤ µ≤ (3.183)) = 95%

According to the participants our services were also rated according to the range of facilities available, out of 100 participants only one rated the range of facilities available as very bad, 6 rated the facilities as bad, 20 as neither good or bad and 38 rated the facilities as good, the rest rated them as very good. Therefore according to the rating of the range of facilities we offer, 73 participants rated them as good or very good. This is graphically shown below:

The range of facilities available

Frequency

1

6

7/40

Cross-sectional Data and Correlation and Regression

20

Good

38

Very Good

35

The mean was 4 and this shows that the average rated the range of equipments available as good; the mode was 4 which indicate that the majority of the participants rated the equipments as good. In the case where the mode, the median and mean are equal, the distribution assumes an asymmetric or bell shape where both deviations from the mean are identical, the negative value of skew ness indicates that the distribution is skewed to the left, the standard error of this observation was 0.094281 and therefore we can construct a 95% confidence interval as follows:

8/40

Cross-sectional Data and Correlation and Regression

P(X – T Sx ≤ µ≤ X + T Sx) = 95%

P ((4 – 0.172572 (0.094281)) ≤ µ≤ (4 + 0.172572(0.094281)) = 95%

P ((3.9837) ≤ µ≤ (4.0163)) = 95%

The ratings of the cost of membership indicate that most people rate the costing as bad and only a few people rate the cost of membership as very good, only 3 out of 100 rate the cost of membership as very good, 16 as good, 21 of them rate the cost of membership as neither good or bad, 37 as bad and 23 of them rated the cost of membership as very bad. However the mode rating was the bad rating, therefore the majority rated the cost of membership as bad, the mean rating or the average rating was 2.39 meaning that the mean was the bad rating. Positive skew ness in this distribution shows that the distribution is skewed to the right.

This is graphically shown below;

The Costs of Membership

Frequency

23

9/40

Cross-sectional Data and Correlation and Regression

37

21

Good

16

Very Good

3

The mean of this observation was 2.39, the median and the mode were 2 and the standard error was 0.11, the graph is skewed to the right, therefore we can construct a 95% confidence interval as follows:

10/40

Cross-sectional Data and Correlation and Regression

P(X – T Sx ≤ µ≤ X + T Sx) = 95%

P ((2.39 – 0.172572 (0.11)) ≤ µ≤ (2.39+ 0.172572(0.11)) = 95%

P ((2.371) ≤ µ≤ (2.409)) = 95%

From the above results however it is evident that we have a wide range of equipments and that most of our members are female, most of our members are full members, however there is need to improve on the services rendered by our instructors because most of the participants rated their services to be of low quality, we need also to improve on the quality of equipments used although the distribution of the observation across the ratings assumed a normal distribution or a bell shaped distribution.

Another observation made is that the overall rating of our centre indicate that the majority rated it as good and that the most used facility is the dance studio, the least used facility is sauna. Most of our members are aged between 20 to 30 years was also indicated in the outcome, finally out of 1000 participants 53 of them would recommend our centre to friends and members of the family while only 47 would not recommend our centre.

Overall rating of the leisure Centre

Frequency

0

11/40

Cross-sectional Data and Correlation and Regression

16

40

Good

40

Very Good

4

facility used most often

12/40

Cross-sectional Data and Correlation and Regression

Code

Bin

Frequency

Fitness Centre (machines)

1

1

27

Dance studio (Pilates, yoga, dance)

2

2

32

13/40

Cross-sectional Data and Correlation and Regression

Swimming Pool

3

3

28

4

4

8

Sauna

5

5

14/40

Cross-sectional Data and Correlation and Regression

5

Those who would recommend the centre to friends or family

Frequency

yes

53

No

47

15/40

Cross-sectional Data and Correlation and Regression

Part two

Correlation and regression

We will investigate the increase of unemployment over time using the data available; we will use Denmark to analyze the increasing unemployment rate over time in years, therefore time in years is the independent variable and the unemployment rate is the dependent variable.

rate of  unemployment in Denmark from the year 1970 to 1994

year

Denmark

1970

16/40

Cross-sectional Data and Correlation and Regression

0.6

1971

0.9

1972

0.8

1973

0.7

1974

2.8

1975

17/40

Cross-sectional Data and Correlation and Regression

3.9

1976

5.1

1977

5.9

1978

6.7

1979

4.8

1980

5.2

18/40

Cross-sectional Data and Correlation and Regression

1981

8.3

1982

8.9

1983

9

1984

8.5

1985

7.1

19/40

Cross-sectional Data and Correlation and Regression

1986

5.4

1987

5.4

1988

6.1

1989

7.4

1990

7.7

20/40

Cross-sectional Data and Correlation and Regression

1991

8.4

1992

9.2

1993

10.1

1994

8.2

Scatter diagram

21/40

Cross-sectional Data and Correlation and Regression

Correlation coefficient(r)

It is the measure of the degree of the relationship between two or more variables, in our case we will determine our correlation coefficient using the absolute deviation method where [1]

n ∑ X Y – ∑ X ∑ Y

r =                          ______________________________

(n∑X2 – ( ∑ X)2) ½ ( n∑Y2 – (∑Y)2)1/2

X

Y

XY

22/40

Cross-sectional Data and Correlation and Regression

X

Y

year

Denmark

2

2

1

1970

0.6

23/40

Cross-sectional Data and Correlation and Regression

0.6

1

0.36

2

1971

0.9

1.8

4

0.81

3

1972

24/40

Cross-sectional Data and Correlation and Regression

0.8

2.4

9

0.64

4

1973

0.7

2.8

16

0.49

5

25/40

Cross-sectional Data and Correlation and Regression

1974

2.8

14

25

7.84

6

1975

3.9

23.4

36

15.21

26/40

Cross-sectional Data and Correlation and Regression

7

1976

5.1

35.7

49

26.01

8

1977

5.9

47.2

64

27/40

Cross-sectional Data and Correlation and Regression

34.81

9

1978

6.7

60.3

81

44.89

10

1979

4.8

48

28/40

Cross-sectional Data and Correlation and Regression

100

23.04

11

1980

5.2

57.2

121

27.04

12

1981

8.3

29/40

Cross-sectional Data and Correlation and Regression

99.6

144

68.89

13

1982

8.9

115.7

169

79.21

14

1983

30/40

Cross-sectional Data and Correlation and Regression

9

126

196

81

15

1984

8.5

127.5

225

72.25

16

31/40

Cross-sectional Data and Correlation and Regression

1985

7.1

113.6

256

50.41

17

1986

5.4

91.8

289

29.16

32/40

Cross-sectional Data and Correlation and Regression

18

1987

5.4

97.2

324

29.16

19

1988

6.1

115.9

361

37.21

33/40

Cross-sectional Data and Correlation and Regression

20

1989

7.4

148

400

54.76

21

1990

7.7

161.7

441

34/40

Cross-sectional Data and Correlation and Regression

59.29

22

1991

8.4

184.8

484

70.56

23

1992

9.2

211.6

35/40

Cross-sectional Data and Correlation and Regression

529

84.64

24

1993

10.1

242.4

576

102.01

25

1994

8.2

36/40

Cross-sectional Data and Correlation and Regression

205

625

67.24

325

147.1

2334.2

5525

1066.93

37/40

Cross-sectional Data and Correlation and Regression

Therefore our correlation coefficient (r) is 0.01162

Regression line

We use the classical estimation model which states that when Y= α + β x, then we estimate the model as

α = Y- β x, and [2]

β =  n ∑x y – ∑ x∑ y

______________

n ∑ x2 – (∑ x) 2

[3] Therefore in our case

β = 0.3245

And

38/40

Cross-sectional Data and Correlation and Regression

α = -0.498

Our model therefore will be stated as

Y = – 0.498 + 0.3245 X

The autonomous level of unemployment is – 0.498 and the model still states that an increase in one unit of time (year) will increase the level of unemployment by 0.3245 units.

Over time there has been a rise in the level of unemployment despite the high economic growth in developed countries, the data on Denmark’s unemployment rate trend shows that there has been an increase the rate of unemployment over the years, the rising unemployment rate is matter of concern to all economies in the world and that’s why there has been an increase in efforts to reduce unemployment rates by the use of policies to bring unemployment down and also the level of inflation.

The model we have specified as Y = – 0.498 + 0.3245 X where Y is the level of unemployment and X is time in years, therefore the autonomous level of unemployment is – 0.498 and the model still states that an increase in one unit of time (year) will increase the level of unemployment by 0.3245 units, the model also shows that there is an increase in the level of unemployment overtime, however the autonomous level of unemployment is less than zero and this would show that there has been efforts to reduce the unemployment levels.

The correlation coefficient for the two variables is 0.01162; the value shows a positive relationship between the two variables, however regarding the strength of the relationship we could say that there do not exist a strong relationship between the two variables, this could be because we have omitted other important variables that will determine the level of unemployment example price levels or inflation, the level of national income and government policies.

39/40

Cross-sectional Data and Correlation and Regression

## References

P. Schmidt (1976) Econometrics, Marcel Decker publishers, USA

Sergio J. Ray (1956) Advances in Spatial Econometrics: methodology, tools and applications, Springer publishers, USA

Wooldridge J. (2002) Econometric Analysis of Cross –section and Panel Data, MIT Press, US

[1] P. Schmidt (1976)

[2] P. Schmidt (1976)

[3] P. Schmidt (1976)

40/40