Analyzing Cross-sectional Data and Correlation and Regression
Part one
Analyzing cross sectional data
This report describes the results of a research done on our members regarding the quality of services we offer on both full members and weekend members; it was aimed at exposing the level of our customer satisfaction regarding our services and equipment use. There were 100 participants and this number included 55 full members and 45 weekend members, 47 of them were male and 53 were female.
Full membership
55
Weekend only membership
1/40
Cross-sectional Data and Correlation and Regression
45
Male
1
1
47
Female
2
2
53
2/40
Cross-sectional Data and Correlation and Regression
The rating of the quality of our centre was based on the quality of instructors, quality of equipments used, and range of facilities and the cost of membership. All this ratings indicated the level of customer satisfaction regarding our services.
The rating according to our instructors indicated that out of 100 hundred participants 12 rated the services as bad, 35 of them rated the services as neither bad or good, 39 rated the services as good and only 14 of them rated our services as very good, the majority therefore rated our services good. This can be graphically represented as follows
quality of instructors
Frequency
Very bad
0
Bad
12
3/40
Cross-sectional Data and Correlation and Regression
Neither bad nor good
35
Good
39
Very Good
14
As graphically shown the mode is good, the graph is negatively skewed or skewed to the left meaning that the majority of the outcomes or observations are on the left of the graph, another notable observation is that none of the participants rated our instructors as very bad.
The mean of this observation was 3.55 and the standard error was 0.88048 and therefore the variance is 0.29673, therefore our confidence interval at 95% can be calculated as follows,
P(X – T Sx ≤ µ≤ X + T Sx) = 95%
Where X is the sample mean, T is the value of 5% on the T table and Sx is the standard error of the mean. The T value from the t table is 0.172572
4/40
Cross-sectional Data and Correlation and Regression
P(3.55– 0.172572 (0.88048) ≤ µ≤ 3.55+ 0.172572 (0.88048) = 95%
Therefore we get the confidence interval as
P ( (-3.398) ≤ µ≤ (3.702) )= 95%
Our services were also rated through the quality of equipments used, out of 100 participants 6 of them rated the quality of equipments as very bad, 25 of them rated the equipments as bad, 33 of them rated the equipments as neither bad nor good, 27 of them rated the equipments as good and only 9 rated the equipments as very good.
Therefore the mode or the majority of the participants did not rate our services as bad or good regarding the quality of equipments, the mean of this observation was 3.08 and the standard error was 0.106059, the median and the mode were both 3, the results can be graphically represented as follows:
quality of any equipment used
Frequency
Very bad
6
Bad
5/40
Cross-sectional Data and Correlation and Regression
25
Neither bad nor good
33
Good
27
Very Good
9
0
6/40
Cross-sectional Data and Correlation and Regression
We can construct a 95% confidence interval as follows P(X – T Sx ≤ µ≤ X + T Sx) = 95%
P ((3.08 – 0.172572 (0.106059)) ≤ µ≤ (3.08 + 0.172572(0.106059)) = 95%
P ((3.0617) ≤ µ≤ (3.183)) = 95%
According to the participants our services were also rated according to the range of facilities available, out of 100 participants only one rated the range of facilities available as very bad, 6 rated the facilities as bad, 20 as neither good or bad and 38 rated the facilities as good, the rest rated them as very good. Therefore according to the rating of the range of facilities we offer, 73 participants rated them as good or very good. This is graphically shown below:
The range of facilities available
Frequency
Very bad
1
Bad
6
7/40
Cross-sectional Data and Correlation and Regression
Neither bad nor good
20
Good
38
Very Good
35
The mean was 4 and this shows that the average rated the range of equipments available as good; the mode was 4 which indicate that the majority of the participants rated the equipments as good. In the case where the mode, the median and mean are equal, the distribution assumes an asymmetric or bell shape where both deviations from the mean are identical, the negative value of skew ness indicates that the distribution is skewed to the left, the standard error of this observation was 0.094281 and therefore we can construct a 95% confidence interval as follows:
8/40
Cross-sectional Data and Correlation and Regression
P(X – T Sx ≤ µ≤ X + T Sx) = 95%
P ((4 – 0.172572 (0.094281)) ≤ µ≤ (4 + 0.172572(0.094281)) = 95%
P ((3.9837) ≤ µ≤ (4.0163)) = 95%
The ratings of the cost of membership indicate that most people rate the costing as bad and only a few people rate the cost of membership as very good, only 3 out of 100 rate the cost of membership as very good, 16 as good, 21 of them rate the cost of membership as neither good or bad, 37 as bad and 23 of them rated the cost of membership as very bad. However the mode rating was the bad rating, therefore the majority rated the cost of membership as bad, the mean rating or the average rating was 2.39 meaning that the mean was the bad rating. Positive skew ness in this distribution shows that the distribution is skewed to the right.
This is graphically shown below;
The Costs of Membership
Frequency
Very bad
23
Bad
9/40
Cross-sectional Data and Correlation and Regression
37
Neither bad nor good
21
Good
16
Very Good
3
The mean of this observation was 2.39, the median and the mode were 2 and the standard error was 0.11, the graph is skewed to the right, therefore we can construct a 95% confidence interval as follows:
10/40
Cross-sectional Data and Correlation and Regression
P(X – T Sx ≤ µ≤ X + T Sx) = 95%
P ((2.39 – 0.172572 (0.11)) ≤ µ≤ (2.39+ 0.172572(0.11)) = 95%
P ((2.371) ≤ µ≤ (2.409)) = 95%
From the above results however it is evident that we have a wide range of equipments and that most of our members are female, most of our members are full members, however there is need to improve on the services rendered by our instructors because most of the participants rated their services to be of low quality, we need also to improve on the quality of equipments used although the distribution of the observation across the ratings assumed a normal distribution or a bell shaped distribution.
Another observation made is that the overall rating of our centre indicate that the majority rated it as good and that the most used facility is the dance studio, the least used facility is sauna. Most of our members are aged between 20 to 30 years was also indicated in the outcome, finally out of 1000 participants 53 of them would recommend our centre to friends and members of the family while only 47 would not recommend our centre.
Overall rating of the leisure Centre
Frequency
Very bad
0
Bad
11/40
Cross-sectional Data and Correlation and Regression
16
Neither bad nor good
40
Good
40
Very Good
4
facility used most often
12/40
Cross-sectional Data and Correlation and Regression
Code
Bin
Frequency
Fitness Centre (machines)
1
1
27
Dance studio (Pilates, yoga, dance)
2
2
32
13/40
Cross-sectional Data and Correlation and Regression
Swimming Pool
3
3
28
Games Room (Football, badminton etc
4
4
8
Sauna
5
5
14/40
Cross-sectional Data and Correlation and Regression
5
Those who would recommend the centre to friends or family
Frequency
yes
53
No
47
15/40
Cross-sectional Data and Correlation and Regression
Part two
Correlation and regression
We will investigate the increase of unemployment over time using the data available; we will use Denmark to analyze the increasing unemployment rate over time in years, therefore time in years is the independent variable and the unemployment rate is the dependent variable.
rate of unemployment in Denmark from the year 1970 to 1994
year
Denmark
1970
16/40
Cross-sectional Data and Correlation and Regression
0.6
1971
0.9
1972
0.8
1973
0.7
1974
2.8
1975
17/40
Cross-sectional Data and Correlation and Regression
3.9
1976
5.1
1977
5.9
1978
6.7
1979
4.8
1980
5.2
18/40
Cross-sectional Data and Correlation and Regression
1981
8.3
1982
8.9
1983
9
1984
8.5
1985
7.1
19/40
Cross-sectional Data and Correlation and Regression
1986
5.4
1987
5.4
1988
6.1
1989
7.4
1990
7.7
20/40
Cross-sectional Data and Correlation and Regression
1991
8.4
1992
9.2
1993
10.1
1994
8.2
Scatter diagram
21/40
Cross-sectional Data and Correlation and Regression
Correlation coefficient(r)
It is the measure of the degree of the relationship between two or more variables, in our case we will determine our correlation coefficient using the absolute deviation method where [1]
n ∑ X Y – ∑ X ∑ Y
r = ______________________________
(n∑X2 – ( ∑ X)2) ½ ( n∑Y2 – (∑Y)2)1/2
X
Y
XY
22/40
Cross-sectional Data and Correlation and Regression
X
Y
year
Denmark
2
2
1
1970
0.6
23/40
Cross-sectional Data and Correlation and Regression
0.6
1
0.36
2
1971
0.9
1.8
4
0.81
3
1972
24/40
Cross-sectional Data and Correlation and Regression
0.8
2.4
9
0.64
4
1973
0.7
2.8
16
0.49
5
25/40
Cross-sectional Data and Correlation and Regression
1974
2.8
14
25
7.84
6
1975
3.9
23.4
36
15.21
26/40
Cross-sectional Data and Correlation and Regression
7
1976
5.1
35.7
49
26.01
8
1977
5.9
47.2
64
27/40
Cross-sectional Data and Correlation and Regression
34.81
9
1978
6.7
60.3
81
44.89
10
1979
4.8
48
28/40
Cross-sectional Data and Correlation and Regression
100
23.04
11
1980
5.2
57.2
121
27.04
12
1981
8.3
29/40
Cross-sectional Data and Correlation and Regression
99.6
144
68.89
13
1982
8.9
115.7
169
79.21
14
1983
30/40
Cross-sectional Data and Correlation and Regression
9
126
196
81
15
1984
8.5
127.5
225
72.25
16
31/40
Cross-sectional Data and Correlation and Regression
1985
7.1
113.6
256
50.41
17
1986
5.4
91.8
289
29.16
32/40
Cross-sectional Data and Correlation and Regression
18
1987
5.4
97.2
324
29.16
19
1988
6.1
115.9
361
37.21
33/40
Cross-sectional Data and Correlation and Regression
20
1989
7.4
148
400
54.76
21
1990
7.7
161.7
441
34/40
Cross-sectional Data and Correlation and Regression
59.29
22
1991
8.4
184.8
484
70.56
23
1992
9.2
211.6
35/40
Cross-sectional Data and Correlation and Regression
529
84.64
24
1993
10.1
242.4
576
102.01
25
1994
8.2
36/40
Cross-sectional Data and Correlation and Regression
205
625
67.24
325
147.1
2334.2
5525
1066.93
37/40
Cross-sectional Data and Correlation and Regression
Therefore our correlation coefficient (r) is 0.01162
Regression line
We use the classical estimation model which states that when Y= α + β x, then we estimate the model as
α = Y- β x, and [2]
β = n ∑x y – ∑ x∑ y
______________
n ∑ x2 – (∑ x) 2
[3] Therefore in our case
β = 0.3245
And
38/40
Cross-sectional Data and Correlation and Regression
α = -0.498
Our model therefore will be stated as
Y = – 0.498 + 0.3245 X
The autonomous level of unemployment is – 0.498 and the model still states that an increase in one unit of time (year) will increase the level of unemployment by 0.3245 units.
Over time there has been a rise in the level of unemployment despite the high economic growth in developed countries, the data on Denmark’s unemployment rate trend shows that there has been an increase the rate of unemployment over the years, the rising unemployment rate is matter of concern to all economies in the world and that’s why there has been an increase in efforts to reduce unemployment rates by the use of policies to bring unemployment down and also the level of inflation.
The model we have specified as Y = – 0.498 + 0.3245 X where Y is the level of unemployment and X is time in years, therefore the autonomous level of unemployment is – 0.498 and the model still states that an increase in one unit of time (year) will increase the level of unemployment by 0.3245 units, the model also shows that there is an increase in the level of unemployment overtime, however the autonomous level of unemployment is less than zero and this would show that there has been efforts to reduce the unemployment levels.
The correlation coefficient for the two variables is 0.01162; the value shows a positive relationship between the two variables, however regarding the strength of the relationship we could say that there do not exist a strong relationship between the two variables, this could be because we have omitted other important variables that will determine the level of unemployment example price levels or inflation, the level of national income and government policies.
39/40
Cross-sectional Data and Correlation and Regression
References
P. Schmidt (1976) Econometrics, Marcel Decker publishers, USA
Sergio J. Ray (1956) Advances in Spatial Econometrics: methodology, tools and applications, Springer publishers, USA
Wooldridge J. (2002) Econometric Analysis of Cross –section and Panel Data, MIT Press, US
[1] P. Schmidt (1976)
[2] P. Schmidt (1976)
[3] P. Schmidt (1976)
40/40
- Academic Writing
- Accounting
- Anthropology
- Article
- Blog
- Business
- Career
- Case Study
- Critical Thinking
- Culture
- Dissertation
- Education
- Education Questions
- Essay Tips
- Essay Writing
- Finance
- Free Essay Samples
- Free Essay Templates
- Free Essay Topics
- Health
- History
- Human Resources
- Law
- Literature
- Management
- Marketing
- Nursing
- other
- Politics
- Problem Solving
- Psychology
- Report
- Research Paper
- Review Writing
- Social Issues
- Speech Writing
- Term Paper
- Thesis Writing
- Writing Styles