Family Tree

Family Tree

About Me

My photo
Kathmandu, Bagmati Zone, Nepal
I am Basan Shrestha from Kathmandu, Nepal. I use the term 'BASAN' as 'Balancing Actions for Sustainable Agriculture and Natural Resources'. I am a Design, Monitoring & Evaluation professional. I hold 1) MSc in Regional and Rural Development Planning, Asian Institute of Technology, Thailand, 2002; 2) MSc in Statistics, Tribhuvan University (TU), Kathmandu, Nepal, 1995; and 3) MA in Sociology, TU, 1997. I have more than 10 years of professional experience in socio-economic research, monitoring and documentation on agricultural and natural resource management. I had worked in Lumle Agricultural Research Centre, western Nepal from Nov. 1997 to Dec. 2000; CARE Nepal, mid-western Nepal from Mar. 2003 to June 2006 and WTLCP in far-western Nepal from June 2006 to Jan. 2011, Training Institute for Technical Instruction (TITI) from July to Sep 2011, UN Women Nepal from Sep to Dec 2011 and Mercy Corps Nepal from 24 Jan 2012 to 14 August 2016 and CAMRIS International in Nepal commencing 1 February 2017. I have published articles to my credit.

Monday, October 8, 2018

What Sample Proportion is a Statistically Significant Estimate of Population Proportion?, Statistical Note 37

Toss a coin 20 times in a sample and count the number of heads. Repeat the same process for seven samples each constituting 20 tosses. Calculate the sample proportion of heads for each sample and test whether each sample proportion is a significant estimate of the population proportion. Discuss what makes the sample proportion a statistically significant estimate of the population proportion.

Key Words: Population Proportion, Sample Proportion, Standard Error, z-score, Statistical Significance, Sample Size

Introduction

Not all sample proportions are statistically significant estimates of the population proportion. Then, questions arise what sample proportion is significant and what makes the statistical significance. Tossing of a coin is an example of the binary categorical random variable to explain the statistical significance of the sample proportions. Refer to my statistical note 36 to know more about population and sample proportions and normality of the sampling distribution of the sample proportions.

Observed Data

I tossed a coin 20 times in a sample (S) and the same process was repeated for seven samples (S1 to S7). Can one guess how many heads will there be in each sample? Table 1 presents the outcome of 20 tosses of a coin in each of seven samples.

Table 1: Outcomes in 20 tosses of a coin in each of seven samples
















Every sample of 20 tosses in Table 1 is from a population constituting the large number of possible tosses. Below section discusses on whether the sample proportions are statistically significant estimates of the population proportion.

Discussion

Population proportion, denoted by ‘p’ in coin tossing experiments is 0.5. Sample proportion is denoted by  ’p^’ and spelt as ‘p-Hat’. The sample proportion of heads in 20 tosses of a coin ranged between 0.35 to 0.65 (Table 1).

Mean of sample proportions ‘p^’, also called center, is the population proportion ‘p’. Symbolically, it is indicated by µp^=p. In coin tossing, 0.5 is the mean of sample proportions or the population proportion.

Standard Deviation of Sample Proportions is expressed as the square root of the population proportion multiplied by one minus population proportion divided by sample size. This is referred to as spread or Standard Error (SE) of sample proportions, denoted by σp^ is Ö[p x (1-p)/n] where ‘p’ is the population proportion and ‘n’ is the sample size. In this example, SE is calculated to be Square root [0.5 x (1-0.5)/20], equal to 0.111803.

Sampling distribution of sample proportions with the sample size of 20 tosses of a coin is approximate to normal distribution with mean p=0.5 and σp^=0.111803.

In sampling distribution of sample proportions following normal distribution, z-score or test statistic is a measure calculated as the difference between population proportion and sample proportion divided by SE of sample proportions. Symbolically, z=(p+p^)/Ö[p x (1-p)/n]. In normal distribution the z- score equal to minus or plus 1.96 is a commonly used cut-off point for the sample proportion to be a statistically significant estimate of the population proportion indicating that 95 percent samples have population proportion within the confidence interval of minus 1.96 to plus 1.96. It means one is 95 percent confident that the population proportion will fall within 1.96 confidence interval. z-score ranging between negative to positive cut-off point is called the confidence interval. z-score less than minus 1.96 and greater than plus 1.96 indicate that the sample proportion is a statistically significant estimate of the population proportion from a different population. One point to note here is that the sample size ‘n’ is directly proportional to z-score indicating that as the sample size ‘n’ increases, z-score also increases.

I calculated the z-score for all seven sample proportions in this example (Table 2). For example, for the sample proportion P^=0.35, z = (p^-p) / σp^, where σp^=Ö[p x (1-p)/n]  = Ö[0.50 x 0.50/20] = 0.111803 so that z = (0.35-0.50) / 0.111803 = 1.3416. Likewise, the z-score was calculated for each sample proportion and tabulated. Using the cut-off point of z-score equal to minus or plus 1.96, the sample proportions in this example were not found to be the significant estimates of the population parameter of 0.50, given the sample size of 20 tosses of a coin.

Table 2: Sample proportions and their significance to estimate population proportion of samples constituting 20 tosses of a coin









The sample proportions as smaller as 0.2808 and as bigger as 0.7192 are statistically significant estimates of the population proportion 0.50, given the sample size of 20 tosses of a coin.

If the sample size is increased, the sample proportions bigger than 0.2808 and smaller than 0.7192 will be statistically significant estimates of population proportion of 0.50. Example, if the sample size is increased to 50 tosses, the sample proportions as smaller as 0.3614 and as bigger as 0.6386 are statistically significant estimates of the population proportion of 0.50. Likewise, if the sample size is increased to 100 tosses, the sample proportions as smaller as 0.4020 and as bigger as 0.5980 are statistically significant estimate of the population proportion of 0.50.

I am curious whether John Kerrich’s observed sample proportion of heads equal to 0.5067 in 10,000 tosses of a coin is a statistically significant estimate of population proportion of 0.5 or not. z-score for this is calculated to be 1.34, which is lower than the cut-off point of 1.96 indicating that this is one of 95 percent samples each of 10,000 tosses so that one can be 95 percent confident that this sample proportion is an insignificant estimate of the population proportion, 0.50. Thus, this sample proportion is not a statistically significant estimate of the population proportion, 0.50.

Conclusion

Not all sample proportions are statistically significant estimates of the population proportion. The sample size is pivotal for identifying whether the sample proportion is a statistically significant estimate of the population proportion or not.

Sunday, October 7, 2018

Sampling Distribution of One Sample Proportions, Statistical Note 36

Toss a coin 20 times in a sample and count the number of heads. Repeat the same process for seven samples each constituting 20 tosses. Calculate the sample proportion of heads for each sample and discuss the sampling distribution of one sample proportions.

Key Words: Population, Sample Space, Sampling Frame, (Population) Parameter, Expected Value, Proportion, Population Proportion, Sample, Subset, Random Sample, Sampling Unit, Independent Trials, (Sample) Statistic, Point Estimate, Sample Proportion, Likelihood Estimate, Maximum Likelihood Estimate, Sampling Variability, Sampling Variability, Sampling Error, Estimate, Estimator, Sample Mean, Center, Sample Variance, Sample Standard Deviation, Spread, Mean of Sample Proportions, Standard Deviation of Sample Proportions, Standard Error, Categorical Variable, Binary Random Variable, Binary Data, Binary Outcome, Bernoulli Random Variable, Binomial Random Variable, Frequency Distribution, Sampling Distribution, Shape, Normal Distribution, Law of Averages, Law of Large Numbers, Central Limit Theorem.

Introduction

Numerous samples of equal size can be formulated from a population. But, the sample characteristics vary from one to others due to sampling variability. Sample proportion is widely used as a summary measure of a binary or two-categorical random variable.  Sampling distribution of a sample proportion gives an idea of population proportion that usually remains unknown  in the real world. Tossing of a coin is an example of the binary categorical random variable that explains sampling distribution of sample proportions although population proportion in coin tossing experiments is already known.

Observed Data

I tossed a coin 20 times in a sample (S) and the same process was repeated for seven samples (S1 to S7). Can one guess how many heads will there be in each sample? Table 1 presents the outcome of 20 tosses of a coin in each of seven samples.

Table 1: Outcomes in 20 tosses of a coin in each of seven samples




















Every sample of 20 tosses in Table 1 is a representative of the large number of possible tosses. However, an outcome of every event is different, although the number of heads out of 20 tosses is same or different. Below sections will discuss in detail what does large number of possible tosses mean, it’s representation, and lot more about the outcomes in Table 1.

Discussion

Population constitutes an entire set of possible cases or values. It is also referred to the sample space. The population for coin tossing contains the results of tossing the coin for countably the large number of times. Not sure how large is large. There are some records of tossing a coin for large number of times. A French naturalist Count Buffon (1707 - 1788) tossed a coin 4040 times. Likewise, a South African mathematician John Kerrich tossed a coin 10,000 times as experiments to pass his time in imprisonment in Denmark during fifties.

A parameter, also referred to as population parameter or an expected value, is a population value that describes the characteristic of the population. Usually, the value of a parameter is unknown as the entire population is not enumerated. But, in case of tossing of an unbiased coin, the parametric value is already known, either head or tail appears in a toss. Turning of head or tail is mutually exclusive. This is referred to as Independence, the third rule of sample proportion..

Proportion is a special case of mean for binary data. Population proportion is a population parameter. Population proportion is the ratio of number of success and the entire number of cases in the population. It is the probability of success ‘p’ that ranges between zero and one. If a coin is tossed, either of head or tail turns up. The population proportion of success, turning up head in a toss is one face of head divided by two possible faces of head or tail, that is half. Symbolically, population proportion is indicated by µ=p. John Kerrich observed the proportion of heads equal to 0.5067 in 10,000 tosses of a coin.

Sample is a subset of the population selected for enumeration. Sample is drawn randomly to avoid purposeful bias in which every individual object in the population has an equal chance of being selected. This example of tossing of a coin follows randomization, the first rule of sample proportion. Sampling unit is an individual unit or outcome of the sample. In this case, turning up of a head or tail in a toss of a coin is a sampling unit.

Statistic, also referred to as (sample) statistic, is a sample value or a measure that describes the sample such as sample mean and sample variance. Statistics vary from one sample to another due to sampling variability.   A statistic is used to estimate an unknown parameter. It is called point estimate.

Sample proportion is the observed number of successes out of total sample size. It is a random variable that takes the value between zero and one. The sample proportion denoted by  ’p^’ and spelt as ‘p-Hat’ is the value of success divided by the sample size ‘n'. ‘-Hat’ is an indication of ‘estimate of’. Thus, ’p^’, a statistic  is an estimate of the parameter ‘p’. In this example, the number of successes, heads out of total tosses in a sample. The expected value of the sample proportion is equal to the population proportion. In this example, the sample proportion of heads in 20 tosses of a coin ranged between 0.35 to 0.65 (Table 2). Some sample proportions were smaller than the population proportion while others are equal and larger due to sampling variability or error.

Table 2: Sample proportion of heads in 20 tosses of a coin in each of seven samples



Sample proportions ranging between zero to one are all likelihood estimates of the population proportion. Theoretically, every value of sample proportion between zero to one is possible. Bernoulli trial will have the sample proportion either zero or one. As the sample size increases, the sample proportion closer to the population proportion is highly likely to occur. Among them, the sample proportion equal to the population proportion is most likely and is called the Maximum likelihood estimate of the population proportion. Other sample proportions are less likely than the maximum likelihood estimate. In this example, population proportion in coin tossing is 0.5 and sample proportion equal to 0.5 is the maximum likely hood estimate.

Mean of sample proportions ‘p^’, also called center, is the population proportion ‘p’. Symbolically, it is indicated by µp^=p. In coin tossing, 0.5 is the mean of sample proportions or the population proportion.

Standard Deviation of Sample Proportions is expressed as the square root of the population proportion multiplied by one minus population proportion divided by sample size. This is referred to as spread or Standard Error (SE) of sample proportion or sampling distribution. Symbolically, σp^=Ö[p x (1-p)/n] where 'p' is the population proportion and 'n' is the sample size.

In this example, SE is calculated to be Square root [0.5 x (1-0.5)/20], equal to 0.111803. It indicates that the difference between sample proportion and population proportion is 0.111803.  Sample proportions within one SE from population proportion will be 0.6111803 and 0.388197. According to Normal distribution, 68 percent samples of equal size will have population proportion within one SE from sample proportions. In this case, the sample proportions ranging between 0.723606 and 0.276394 have population proportion of 0.50 within two SEs of sample proportions. Likewise, 95 percent samples of equal size will have population proportion within two SEs from sample proportions. These indicate that the samples of 20 tosses each in this example are not surprising and fall among 95 percent samples of that size provided sample size is adequate to follow normal distribution. It means that I am 95 percent sure that the population proportion will be within the sample proportions ranging between 0.723606 and 0.276394. However, sample size is a decisive factor to calculate the sample proportion and SE. Bigger samples are less spread than smaller samples.

Sampling distribution is the frequency distribution of sample statistics. A sampling distribution lists the possible values of a statistic. The frequency distribution of sample proportions is called sampling distribution of sample proportions.

Sample proportions close to population proportion are more likely to occur and sample proportions farther to population are less likely to occur. Thus, the shape of the sampling distribution of sample proportions bulge in the middle part close to the population proportion and taper farther from the population, will closer to the shape of normal distribution.  

Sample size determines the accuracy of the estimation of the population parameter. The larger the sample the smaller will be the variability. In the formula, SE is inversely proportion to the sample size. This is also referred to the Law of Large Numbers.  One may be interested to know how large a larger sample size is. If the product of sample size and population proportion and the product of sample size and one minus population proportion is more than or equal to five, sampling distribution of sample proportion is said to follow normal distribution. Symbolically, the rule of thumb for normality has two criteria - 1) an expected number of head (n*p) >=5 and 2) an expected number of tail [n*(1-p)] >=5, X follows normal distribution with mean p and standard deviation equal to Square root [p x (1-p)/n. This result is called the Central Limit Theorem. Others argue that n*p and n*(1-p) should be greater than or equal to 10. This is called Normality, the second rule of sample proportion.

This exemplary sample of 20 tosses meets the normal distribution’s both criteria. Both n*p  and n*(1-p) are 10. Thus, we can conclude that the sampling distribution of sample proportions will be approximate to normal distribution with mean of sample proportions, p=0.5 and standard error of sample proportions, σp^=0.111803.

Conclusion

The population parameters are usually unknown. Sample statistic is used to estimate the population parameter. Sampling proportion is used to estimate population proportion. Sampling distribution of sample proportion gives an idea of sampling variability. Sampling distribution of sample proportion in tossing of a coin follows normal distribution if a sample constitutes 20 tosses of a coin following Central Limit Theorem. Larger the sample size, the smaller will be the spread of distribution. The observed sampling distribution of sample proportions were among 95 percent samples of 20 tosses of a coin. In other words, I am 95 percent sure that the population proportion will be within the sample proportions ranging between 0.723606 and 0.276394, although I already know that population proportion is 0.5.