Toss a coin 20 times in a sample and count the number of heads. Repeat the same process for seven samples each constituting 20
tosses. Calculate the sample proportion of heads for each sample and discuss the sampling distribution of one sample proportions.
Key Words: Population, Sample Space, Sampling Frame, (Population) Parameter, Expected
Value, Proportion, Population Proportion, Sample, Subset, Random Sample, Sampling
Unit, Independent Trials, (Sample) Statistic, Point Estimate, Sample
Proportion, Likelihood Estimate, Maximum Likelihood Estimate, Sampling
Variability, Sampling Variability, Sampling Error, Estimate, Estimator, Sample
Mean, Center, Sample Variance, Sample Standard Deviation, Spread, Mean of Sample
Proportions, Standard Deviation of Sample Proportions, Standard Error, Categorical
Variable, Binary Random Variable, Binary Data, Binary Outcome, Bernoulli Random
Variable, Binomial Random Variable, Frequency Distribution, Sampling
Distribution, Shape, Normal Distribution, Law of Averages, Law of Large
Numbers, Central Limit Theorem.
Introduction
Numerous samples of equal size can be formulated from a population. But,
the sample characteristics vary from one to others due to sampling variability.
Sample proportion is widely used as a summary measure of a binary or two-categorical random variable.
Sampling distribution of a sample
proportion gives an idea of population proportion that usually remains unknown in the real
world. Tossing of a coin is an example of the binary categorical random variable that explains sampling distribution of sample proportions although population proportion in coin tossing experiments is already known.
Observed Data
I tossed a coin 20 times in a
sample (S) and the same process was repeated for seven samples (S1 to S7). Can
one guess how many heads will there be in each sample? Table 1 presents the
outcome of 20 tosses of a coin in each of seven samples.
Every sample of 20 tosses in Table 1 is a representative of the large number of possible tosses. However, an outcome of every event is different, although the number of heads out of 20 tosses is same or different. Below sections will discuss in detail what does large number of possible tosses mean, it’s representation, and lot more about the outcomes in Table 1.
Discussion
Population constitutes an entire
set of possible cases or values. It is also referred to the sample space.
The population for coin tossing contains the results of tossing the coin for
countably the large number of times. Not sure how large is large. There are
some records of tossing a coin for large number of times. A French naturalist
Count Buffon (1707 - 1788) tossed a coin 4040 times. Likewise, a South African
mathematician John Kerrich tossed a coin 10,000 times as experiments to pass
his time in imprisonment in Denmark during fifties.
A parameter, also referred to as population parameter or
an expected value, is a population value that describes the characteristic
of the population. Usually, the value of a parameter is unknown as the entire
population is not enumerated. But, in case of tossing of an unbiased coin, the
parametric value is already known, either head or tail appears in a toss. Turning of head
or tail is mutually exclusive. This is referred to as Independence, the third
rule of sample proportion..
Proportion is a special
case of mean for binary data. Population proportion is a population
parameter. Population proportion is the ratio of number of success and the
entire number of cases in the population. It is the probability of success ‘p’
that ranges between zero and one. If a coin is tossed, either of head or tail
turns up. The population proportion of success, turning up head in a toss is
one face of head divided by two possible faces of head or tail, that is half. Symbolically,
population proportion is indicated by µ=p. John Kerrich observed the proportion
of heads equal to 0.5067 in 10,000 tosses of a coin.
Sample is a subset
of the population selected for enumeration. Sample is drawn randomly to avoid
purposeful bias in which every individual object in the population has an equal
chance of being selected. This example of tossing of a coin follows randomization,
the first rule of sample proportion. Sampling unit is an individual unit or outcome
of the sample. In this case, turning up of a head or tail in a toss of a coin
is a sampling unit.
Statistic, also referred
to as (sample) statistic, is a sample value or a measure that describes
the sample such as sample mean and sample variance. Statistics vary from one
sample to another due to sampling variability. A statistic is used to estimate an unknown
parameter. It is called point estimate.
Sample proportion is
the observed number of successes out of total sample size. It is a random
variable that takes the value between zero and one. The sample proportion denoted
by ’p^’ and spelt as ‘p-Hat’ is the
value of success divided by the sample size ‘n'. ‘-Hat’ is an indication of
‘estimate of’. Thus, ’p^’, a statistic is
an estimate of the parameter ‘p’. In this example, the number of successes,
heads out of total tosses in a sample. The expected value of the sample
proportion is equal to the population proportion. In this example, the sample
proportion of heads in 20 tosses of a coin ranged between 0.35 to 0.65 (Table 2).
Some sample proportions were smaller than the population proportion while
others are equal and larger due to sampling variability or error.
Table 2: Sample proportion of heads in 20 tosses of a coin in each of seven samples
Table 2: Sample proportion of heads in 20 tosses of a coin in each of seven samples
Sample proportions ranging between zero to one are all likelihood estimates
of the population proportion. Theoretically, every value of sample proportion
between zero to one is possible. Bernoulli trial will have the sample
proportion either zero or one. As the sample size increases, the sample
proportion closer to the population proportion is highly likely to occur. Among
them, the sample proportion equal to the population proportion is most likely and
is called the Maximum likelihood estimate of the population proportion.
Other sample proportions are less likely than the maximum likelihood estimate.
In this example, population proportion in coin tossing is 0.5 and sample
proportion equal to 0.5 is the maximum likely hood estimate.
Mean of sample proportions ‘p^’, also called center, is the population proportion ‘p’. Symbolically,
it is indicated by µp^=p. In coin tossing, 0.5 is the mean of sample
proportions or the population proportion.
Standard Deviation of Sample
Proportions is expressed as the square root of the population proportion
multiplied by one minus population proportion divided by sample size. This is
referred to as spread or Standard Error (SE) of
sample proportion or sampling distribution. Symbolically, σp^=Ö[p
x (1-p)/n] where 'p' is the population proportion and 'n' is the sample size.
In this example, SE is calculated
to be Square root [0.5 x (1-0.5)/20], equal to 0.111803. It indicates that the
difference between sample proportion and population proportion is 0.111803. Sample proportions within one SE from
population proportion will be 0.6111803 and 0.388197. According to Normal distribution, 68 percent samples of equal size will have population proportion within one SE from sample proportions. In this case, the sample proportions ranging between 0.723606 and 0.276394 have population proportion of 0.50 within two SEs of sample proportions. Likewise, 95 percent samples of equal size will have population proportion within two SEs from sample proportions. These indicate that the samples of 20 tosses each in
this example are not surprising and fall among 95 percent samples of that size
provided sample size is adequate to follow normal distribution. It means that I am 95 percent sure that the population proportion will be within the sample proportions ranging between 0.723606 and 0.276394. However, sample
size is a decisive factor to calculate the sample proportion and SE. Bigger
samples are less spread than smaller samples.
Sampling distribution is the frequency distribution of sample
statistics. A sampling distribution lists the possible values of a
statistic. The frequency distribution of sample proportions is called sampling
distribution of sample proportions.
Sample proportions close to population proportion are more likely to
occur and sample proportions farther to population are less likely to occur.
Thus, the shape of the sampling distribution of sample proportions bulge
in the middle part close to the population proportion and taper farther from
the population, will closer to the shape of normal distribution.
Sample size determines the
accuracy of the estimation of the population parameter. The larger the sample
the smaller will be the variability. In the formula, SE is inversely
proportion to the sample size. This
is also referred to the Law of Large Numbers. One may be interested to know how large a
larger sample size is. If the product of sample size and population proportion and
the product of sample size and one minus population proportion is more than or
equal to five, sampling distribution of sample proportion is said to follow normal
distribution. Symbolically, the rule of thumb for normality has two criteria - 1) an expected number of head (n*p) >=5 and 2) an expected number of tail [n*(1-p)] >=5, X follows normal distribution with mean p and
standard deviation equal to Square root [p x (1-p)/n. This result is called the Central Limit
Theorem. Others argue that n*p and n*(1-p) should be greater than or equal
to 10. This is called Normality, the second rule of sample
proportion.
This exemplary sample of 20 tosses meets the normal distribution’s both
criteria. Both n*p and n*(1-p) are 10. Thus,
we can conclude that the sampling distribution of sample proportions will be
approximate to normal distribution with mean of sample proportions, p=0.5 and standard error of sample proportions, σp^=0.111803.
Conclusion
The population parameters are
usually unknown. Sample statistic is used to estimate the population parameter.
Sampling proportion is used to estimate population proportion. Sampling
distribution of sample proportion gives an idea of sampling variability. Sampling
distribution of sample proportion in tossing of a coin follows normal
distribution if a sample constitutes 20 tosses of a coin following Central
Limit Theorem. Larger the sample size, the smaller will be the spread of
distribution. The observed sampling distribution of sample proportions were
among 95 percent samples of 20 tosses of a coin. In other words, I am 95 percent sure that the population proportion will be within the sample proportions ranging between 0.723606 and 0.276394, although I already know that population proportion is 0.5.
No comments:
Post a Comment