Basan Shrestha's Diary: What Sample Proportion is a Statistically Significant Estimate of Population Proportion?, Statistical Note 37

Toss a coin 20 times in a sample and count the number of heads. Repeat the same process for seven samples each constituting 20 tosses. Calculate the sample proportion of heads for each sample and test whether each sample proportion is a significant estimate of the population proportion. Discuss what makes the sample proportion a statistically significant estimate of the population proportion.

Key Words: Population Proportion, Sample Proportion, Standard Error, z-score, Statistical Significance, Sample Size

Introduction

Not all sample proportions are statistically significant estimates of the population proportion. Then, questions arise what sample proportion is significant and what makes the statistical significance. Tossing of a coin is an example of the binary categorical random variable to explain the statistical significance of the sample proportions. Refer to my statistical note 36 to know more about population and sample proportions and normality of the sampling distribution of the sample proportions.

Observed Data

I tossed a coin 20 times in a sample (S) and the same process was repeated for seven samples (S1 to S7). Can one guess how many heads will there be in each sample? Table 1 presents the outcome of 20 tosses of a coin in each of seven samples.

Table 1: Outcomes in 20 tosses of a coin in each of seven samples

Every sample of 20 tosses in Table 1 is from a population constituting the large number of possible tosses. Below section discusses on whether the sample proportions are statistically significant estimates of the population proportion.

Discussion

Population proportion, denoted by ‘p’ in coin tossing experiments is 0.5. Sample proportion is denoted by ’p^’ and spelt as ‘p-Hat’. The sample proportion of heads in 20 tosses of a coin ranged between 0.35 to 0.65 (Table 1).

Mean of sample proportions ‘p^’, also called center, is the population proportion ‘p’. Symbolically, it is indicated by µ_p^=p. In coin tossing, 0.5 is the mean of sample proportions or the population proportion.

Standard Deviation of Sample Proportions is expressed as the square root of the population proportion multiplied by one minus population proportion divided by sample size. This is referred to as spread or Standard Error (SE) of sample proportions, denoted by σ_p^ is Ö[p x (1-p)/n] where ‘p’ is the population proportion and ‘n’ is the sample size. In this example, SE is calculated to be Square root [0.5 x (1-0.5)/20], equal to 0.111803.

Sampling distribution of sample proportions with the sample size of 20 tosses of a coin is approximate to normal distribution with mean p=0.5 and σ_p^=0.111803.

In sampling distribution of sample proportions following normal distribution, z-score or test statistic is a measure calculated as the difference between population proportion and sample proportion divided by SE of sample proportions. Symbolically, z=(p+p_^)/Ö[p x (1-p)/n]. In normal distribution the z- score equal to minus or plus 1.96 is a commonly used cut-off point for the sample proportion to be a statistically significant estimate of the population proportion indicating that 95 percent samples have population proportion within the confidence interval of minus 1.96 to plus 1.96. It means one is 95 percent confident that the population proportion will fall within 1.96 confidence interval. z-score ranging between negative to positive cut-off point is called the confidence interval. z-score less than minus 1.96 and greater than plus 1.96 indicate that the sample proportion is a statistically significant estimate of the population proportion from a different population. One point to note here is that the sample size ‘n’ is directly proportional to z-score indicating that as the sample size ‘n’ increases, z-score also increases.

I calculated the z-score for all seven sample proportions in this example (Table 2). For example, for the sample proportion P_^=0.35, z = (p_^-p) / σ_p^, where σ_p^=Ö[p x (1-p)/n] = Ö[0.50 x 0.50/20] = 0.111803 so that z = (0.35-0.50) / 0.111803 = 1.3416. Likewise, the z-score was calculated for each sample proportion and tabulated. Using the cut-off point of z-score equal to minus or plus 1.96, the sample proportions in this example were not found to be the significant estimates of the population parameter of 0.50, given the sample size of 20 tosses of a coin.

Table 2: Sample proportions and their significance to estimate population proportion of samples constituting 20 tosses of a coin

The sample proportions as smaller as 0.2808 and as bigger as 0.7192 are statistically significant estimates of the population proportion 0.50, given the sample size of 20 tosses of a coin.

If the sample size is increased, the sample proportions bigger than 0.2808 and smaller than 0.7192 will be statistically significant estimates of population proportion of 0.50. Example, if the sample size is increased to 50 tosses, the sample proportions as smaller as 0.3614 and as bigger as 0.6386 are statistically significant estimates of the population proportion of 0.50. Likewise, if the sample size is increased to 100 tosses, the sample proportions as smaller as 0.4020 and as bigger as 0.5980 are statistically significant estimate of the population proportion of 0.50.

I am curious whether John Kerrich’s observed sample proportion of heads equal to 0.5067 in 10,000 tosses of a coin is a statistically significant estimate of population proportion of 0.5 or not. z-score for this is calculated to be 1.34, which is lower than the cut-off point of 1.96 indicating that this is one of 95 percent samples each of 10,000 tosses so that one can be 95 percent confident that this sample proportion is an insignificant estimate of the population proportion, 0.50. Thus, this sample proportion is not a statistically significant estimate of the population proportion, 0.50.

Conclusion

Not all sample proportions are statistically significant estimates of the population proportion. The sample size is pivotal for identifying whether the sample proportion is a statistically significant estimate of the population proportion or not.