Problem 1.3:

a. frequency and percentage frequency for sold houses:

Vacant houses are not yet sold and therefore the required houses are either owner occupied or occupied by tenants, the first step is to determine those houses that are owner occupied,

Owner occupied:

The first step is to identify the houses that are owner occupied, using the EXACT function in excel the results are set to test whether the occupied variable matches owner occupied and returns the value TRUE if true and FALSE if false

Tenants:

The next step is to identify the houses that are occupied by tenants, using the EXACT function in excel, the results are set to test whether the occupied variable matches homes occupied by tenants and returns the value TRUE if true and FALSE if false.

Sold houses:

1/13

Problem 1.3

To identify which houses are occupied meaning that they are sold then the OR function is used, this returns the value TRUE if either of the above EXACT results are true.

Number of bedrooms:

To identify the number of bedrooms for the houses that are sold the IF function is used, if the value of the OR results is true then the value of the IF results is 1, and if the OR function results is false then the value of the IF results is 0. The number of bedrooms from the original data is multiplied with the results of the IF sign to get a new variable named sold and number of bedrooms.

Frequency table:

To come up with the frequency table the COUNTIF function is used, it is set to count the number of value from the new variable sold and number of bedrooms, setting the value 1, 2, 3, 4, 5, 6 and 7 for each class 1 to 7 respectively. The table below summarizes the frequencies and percentage frequencies:

bedrooms

frequency

percentage frequency

2/13

Problem 1.3

1

3

0.5128%

2

67

11.4530%

3

348

59.4872%

4

150

3/13

Problem 1.3

25.6410%

5

15

2.5641%

6

1

0.1709%

7

1

0.1709%

total

4/13

Problem 1.3

585

100%

Where percentage frequency of the class 1 = 3/ 548 * 100 = 0.5128%

b. polygon and histogram:

The chart below is the histogram,

The chart below is the percentage frequency polygon

c. cumulative frequency polygon

Problem 1.4:

a. Compute the

mean

5/13

Problem 1.3

154863.177

median

130000

first  quartile

99000

third  quartile

170162.5

b. Compute the

Variance

1.5108E+10

6/13

Problem 1.3

Standard deviation

122912.8072

Range

1558000

Interquartile range

71162.5

Coefficient of variation

0.793686464

Skewness

6.300663412

Z scores for the price variable

7/13

Problem 1.3

Price(example)

Z scores for the price variable

52000

-0.836879241

100000

-0.446358505

200000

8/13

Problem 1.3

0.367226363

C. skewness

The skewness value is 6.3, this means that data is positively skewed meaning that a large numeric of price values are relatively high compared to low prices.

d. results:

the mean house price 154863.177 and prices deviate 122912.8072 units from the mean price, the mean is greater than the median meaning that data is positively skewed, the range value indicates the difference between the lowest and largest value and in this case the range is1558000, finally there are more prices that have relatively high prices and this is evident from the skewness value.

e. proportion of house prices that are

(+/- 1 standard deviations  of the mean)

(+/- 2 standard deviations  of the mean)

9/13

Problem 1.3

(+/- 3 standard deviations  of the mean)

mean

154,863.18

154,863.18

154,863.18

standard deviation

122,912.81

122,912.81

122,912.81

upper limit

277,775.98

10/13

Problem 1.3

400,688.79

523,601.60

lower limit

31,950.37

(90,962.44)

(213,875.24)

number

989

1055

1066

proportion

11/13

Problem 1.3

92%

98%

99%

F: Compare with the empirical rule:

The empirical rule states that 68% of data is contained within (+/- 1 standard deviations of the mean), 95% within (+/- 2 standard deviations of the mean) and 99.7% within (+/- 3 standard deviations of the mean) this means that the above distribution does not assume a normal distribution given that 92% compared to 68% of the data compared with the empirical rule are contained within (+/- 1 standard deviations of the mean)

Problem 1.5:

a. covariance between price and size:

Covariance = 94168317.66

12/13

Problem 1.3

b. correlation between price and size:

Correlation =0.760689174

(c) Scatter plot between price and size

(d) Relationship between price and size

From the correlation coefficient there is a strong positive relationship between price and size, this relationship is depicted by the scatter diagram which shows that as size increase prices also increase.

(e) Conclusions about the relationship between price and size

The size of a house will influence the price, the larger the house the higher the price and the smaller the size the cheaper is the house.

13/13