Sampling methods and sample size calculation

## Introduction:

Sampling is an important issue when undertaking research, there are various methods of sampling that can be used and the sampling method must be specifies in any study, the reason for specifying the sampling method is because this helps validate the sample used. Sampling methods include random sampling, cluster sampling and snowball sampling, the selection of the sampling method to be used will depend on the nature of study, and in this paper we consider how we can calculate the sample size given the population size.

Sampling:

Sampling is important in that we involve fewer respondents than using the entire population, the sample saves time and money and an appropriate sample will represent the population in that the results derived from the sample will explain the population with less error. A large sample will waste time and money while a small sample will give inaccurate results.

Determining the sample size:

In any given study if we were to determine the mean of the population and the mean of the sample there means are not the same, the difference between the two is termed as an error, therefore when determining the sample size we need to consider the expected error that will result to these differences. The other factor to consider is the margin of this error, this represents the maximum possible difference between the sample mean and the population mean. We consider also consider the standard deviation of the population, the reason why we consider the standard deviation is because we assume that the population assumes a normal distribution which is depicted by the central limit theorem that states that as the number of

1/15

Sampling Methods and Sample Size Calculation

variables increase indefinitely then the variables assumes a normal distribution.

Formula 1:

We calculate the margin error as follows:

E = Z/2. {δ/(n)½}

Where E is the margin error

Z is the critical value from the Z distribution

n is the sample size

δ is the population standard deviation

Using this formula we make n the subject of the formula so that we can determine our sample size, the following is the result:

n = [(Z/2 . δ) /(E)] 2

Example:

2/15

Sampling Methods and Sample Size Calculation

Given that the expected margin error is 0.4, Z is 1.96 and the population standard deviation value is 0.9 then we determine the sample size as follows:

n = [(1.96/2 . 6.9) /(0.4)] 2

n = 285.779

In this case therefore we will use a sample size n =286 derived from rounding off the figure into the nearest whole number.

Cluster sampling:

For a clustered study there is need to consider the sampling design when calculating the sample size, we consider the number of clusters after calculating the sample size, after determining the sample size as shown above we multiply the results by the number of clusters, the results of this are then multiplied by the an expected non response or error, example use 5%. After multiplying we then divide the results by the number of clusters to determine the number of n in each cluster.

Example assumes that we have 10 clusters and we assume the level of error is 5% from our above results; the following will be the results:

285.779 X 10 = 2857.79

3/15

Sampling Methods and Sample Size Calculation

2857.79 X 1.05 = 3000.68

We will consider a 3,000 sample size and for each cluster we will have n = 300

Formula 2:

The other formula that can be used is where we have the prevalence of the variable being studies, in this case for example we have a prevalence rate of a disease and we use the following formula:

n = [Z2. x (1-x)]/ E2

Stage one:

In this study we consider the case control design factor in determining the sample size, we consider a 95% confidence interval and a 0.05 margin error, the sampling design which will be constructed which included two sections namely the married and the unmarried, for this reason therefore there will be a design effect when calculating the sample size to be used.

n = [Z2. x (1-x)]/ E2

4/15

Sampling Methods and Sample Size Calculation

Z = 1.96

X = 0.9 those who have married despite the risk of bearing children with ailments

E = 0.05

This will derive a sample size of 138.2976.

We consider the controls that were set at 4 for each case, for this reason therefore our sample size derived above will be multiplied by 4, the result is 553.1904, and for this reason we select a sample size of 554.

Stage two:

Where Z is the confidence interval where if we choose 95% the area under the normal curve will be 1.96, E is the expected margin error and x is the expected prevalence of the variable being studied. We use the above formula to determine the sample size to be used for the study on HBV, HCV and HIV:

The following table summarizes the sample size that will be considered in our study, however we will have to assume the value of the standard deviation for the population, however we will consider a confidence interval 95% which will yield Z = 1.96 as the area under the normal distribution curve. We use the formula n = [Z2. x (1-x)]/ E2 to determine the sample size as follows:

5/15

Sampling Methods and Sample Size Calculation

prevalence

confidence level

margin error

6/15

Sampling Methods and Sample Size Calculation

x

Z

E

z2

1-x

x(1-x)

Z2  .x(1-x)

E2

[Z2  .x(1-x)]/E2

HBV

0.2

7/15

Sampling Methods and Sample Size Calculation

1.96

0.04

3.8416

0.8

0.16

0.614656

0.0016

384.16

HCV

0.1

1.96

0.02

8/15

Sampling Methods and Sample Size Calculation

3.8416

0.9

0.09

0.345744

0.0004

864.36

HIV

0.05

1.96

0.025

3.8416

9/15

Sampling Methods and Sample Size Calculation

0.95

0.0475

0.182476

0.000625

291.9616

From the above table therefore we expect that under HBV we will have a sample size n = 384, HCV sample size to be n = 864 and HIV sample will be n=291. The margin errors for the three samples will be 0.04, 0.02 and 0.025 for HBV, HCV and HIV respectively.

After determine the sample size we will have to select the sample from the population, this will be done by selecting one of the sampling method example random sampling, cluster sampling or snow ball sampling. When the sample is selected the study can be undertaken where results can be compared with a similar study on a different sample that may be undertaken after the study.

If the study is to be undertaken in clusters whereby equal samples will be selected from each

10/15

Sampling Methods and Sample Size Calculation

region, then we need to consider the sample size for each region, in this case we consider the number of clusters and this will be the number of regions which is equal to 20, we multiply the sample size for each by 20, then consider a 5% error for the no response and errors involved and then divided the value by 20, this is summarizes by the table above:

n

x

c

sample size

11/15

Sampling Methods and Sample Size Calculation

number of clusters

Contingency (5%)

nx

(nx). 1.05

[(nx). 1.05]/20

HBV

384.16

20

0.05

7683.2

8067.36

403.368

12/15

Sampling Methods and Sample Size Calculation

HCV

864.36

20

0.05

17287.2

18151.56

907.578

HIV

291.9616

20

0.05

13/15

Sampling Methods and Sample Size Calculation

5839.232

6131.194

306.55968

From the above calculations it is evident that for HBV we need to interview 403, 907 for HCV and 306 for HIV, however because we need to cluster the data we round off the sample size in such a way that the sample size gives an appropriate value when divided by 20, for this reason therefore we choose 400 for HBV, 900 for HCV and 300 for HIV, when these figures are divided by

## References:

14/15

Sampling Methods and Sample Size Calculation

Alan Stuart (1998) Basic Ideas of Scientific Sampling, McGraw Hill publishers, New  York

Cochran W. (1977) Sampling Techniques 3rd Edition, Wiley publishers, New York

Kish L (1996) Survey Sampling, Wiley publishers, New York

15/15