Among 40 participants in a training, 10 were youths (less
than 30 years), 20 were adults (30 to 59 years) and remaining 10 were senior
citizens (60 or more years). 2 participants are selected at random one after
another without replacement of the name of the first selected participant.
Calculate the probability distribution of youths in which the order does not
matter whether the youths are sampled or not in the first or second draw.
Counting the total and favorable
numbers of outcomes constituting the specified number of objects sampled
without replacement from the finite population is important to calculate the
probability of favorable events. Tree diagram is important to visualize, count
the number of outcomes and calculate the probability. As the number of times
the objects are selected increases, the diagram becomes complicated and it will
be less possible to show all outcomes in the diagram. Thus, Formula is used for
counting the outcomes and calculating the discrete probability distribution. I discuss how to visualize and
calculate the multi-category discrete probability distribution without
replacement using the tree diagram and formula.
Tree Diagram
This example has three categories so that the selected
participant is one of youth (Y), adult (A) or senior citizen (S) in any selection.
Two participants will be selected consecutively without replacement, referred
to as sampling without replacement. The probability of each
outcome is presented in Diagram 1. For detailed discussion on how they are
derived please refer to my statistical note 6.
Diagram 1: First and second steps showing marginal and conditional probabilities
(multi-category discrete probability distribution of sampling without
replacement)
Total of nine outcomes are right most cells in Diagram 1.
One outcome (YY) in red cell has two youths sampled in both selections without
replacement. Two outcomes (YA and AY) in dark red cells and
two outcomes (YS and SY) in dark green cells have a youth and another participant
sampled in two selections. One outcome (AA) in purple cell, two outcomes (AS
and SA) in grey cells and one outcome (SS) in yellow cell have no youth sampled
in two selections.
The joint probability that both the first and second
selected participants are youths, denoted by P(Y∩Y) or P(YY), is the product of
P(Y) and P(Y/Y) which is equal to 0.036. Following the same process, other
joint probabilities are calculated.
Let X be a random variable that takes the value 0, 1 or 2
as the number of youths sampled without replacement in two selections. The
Probability Mass Functions are calculated and plotted in Table 1 as below:
P(X=0) = P(AA)+P(AS)+P(SA)+P(SS) = [(25 X 24) + 2(25 X 7)
+ (7 X 6)] / (40 X 39) = (600+350+42) / (40 X 39) = 992 / 1560 = 0.636
P(X=1) = P(YA)+P(YS)+P(AY)+P(SY) = [2(8 X 25) + 2(8 X 7)]
/ (40 X 39) = (400+112) / (40 X 39) = 512 / 1560 = 0.328
P(X=2) = P(YY) = 56/ 1560 = 0.036
Table 1: Multi-Category Discrete probability distribution of youths sampled
without replacement
The probability distribution of youth shows that there is
63.6 percent chance that youth will not be sampled among two participants
selected. There is 32.8 percent chance that one of two participants selected
will be a youth and there is 3.6 percent chance that both participants will be youths.
If the probabilities are added, there is 96.4 percent chance that up to one youth
will be selected. There is cent percent chance of sampling two or less number
of youths in the selection of two participants without replacement.
This example has three characteristic
features. First, the example has a finite population of 40 participants,
denoted by ‘N’. Second, there are more than two possible categories that are
mutually exclusive in each trial or experiment. Third, the second trial is
dependent on the outcome of the first trail, referred to as sampling without
replacement. Fourth, each category can occur ‘0’ to ‘n’ times in ‘n’ trails.
Let Xi be a random variable
of interest that takes one of ‘0’ to ‘n’, denoted by ‘xi’ in the sampling
without replacement. ‘i’ takes the value from 1 to k, the number of categories
of response. The probability distribution of Xi is given by the
expression
P(X1=n1, X2=n1,
X3=n3… Xk=nk) = [C(N1,n1)
X C(N2,n2) X ……. C(Nk,nk)]/C(N,n)
where,
N is the population size,
k
is the kth category size in the population,
n is
the total number of draws or sample,
k is
the number of draws from the kth category,
C(N,n) is the combination of sample of size n drawn from the population of size N, also referred to as the Binomial coefficient. For clarity on combination and Binomial coefficient refer to my statistical notes from 10 to 16.
C(N,n) is the combination of sample of size n drawn from the population of size N, also referred to as the Binomial coefficient. For clarity on combination and Binomial coefficient refer to my statistical notes from 10 to 16.
This distribution is referred to as Multivariate Hypergeometric distribution.
In this example, N is equal to 40 and n
is equal to 2. Each of Y, A and S occurs 0 to 2 times and the outcomes are grouped
and Probability Mass Functions (PMF) are calculated as below:
Group One Outcome:
One outcome (AA) has adults sampled in
both selections. The PMF is calculated as:
P(X1=0, X2=2, X3=0)
= [C(8,0) X C(25,2) X C(7,0)]/C(40,2) = (25 X 24) / (40 X 39) = 0.385
Group Two Outcome:
One outcome (SS) has senior citizens sampled
in both selections. The PMF is calculated as:
P(X1=0, X2=0, X3=2)
= [C(8,0) X C(25,0) X C(7,2)]/C(40,2) = (7 X 6) / (40 X 39) = 0.027
Group Three Outcomes:
Two outcomes (AS and SA) have an adult
and a senior citizen sampled in two selections. Taking the outcome constituting
adult participant in the first selection and senior citizen in the second
selection, the PMF is calculated as:
P(X1=0, X2=1, X3=1)
= [C(8,0) X C(25,1) X C(7,1)]/C(40,2) = (2 X 7 X 25) / (40 X 39) = 0.224.
Group Four Outcomes:
Two outcomes (YA and AY) have a youth
and an adult sampled in two selections. Taking the outcome constituting a youth
in the first selection and an adult in the second selection, the PMF is
calculated as:
P(X1=1, X2=1, X3=0)
= [C(8,1) X C(25,1) X C(7,0)]/C(40,2) = (2 X 8 X 25) / (40 X 39) = 0.256.
Group Five Outcomes:
Two outcomes (YS and SY) have a youth
and a senior sampled in two selections. Taking the outcome constituting a youth
in the first selection and a senior in the second selection, the PMF is
calculated as:
P(X1=1, X2=0, X3=1)
= [C(8,1) X C(25,0) X C(7,1)]/C(40,2) = (2 X 8 X 7) / (40 X 39) = 0.072.
Group Six Outcomes:
One outcome (YY) has youths sampled in
both selections. The PMF is calculated as:
P(X1=2, X2=0, X3=0)
= [C(8,2) X C(25,0) X C(7,0)]/C(40,2) = (8 X 7) / (40 X 39) = 0.036
The probabilities of group one to
three outcomes are added to calculate the probability that youths are not sampled in both selection,
which is equal to 0.636, the first row probability in Table 1. The probabilities
of group four and five outcomes are added to calculate the probability that one youth is sampled in two selections, which
is equal to 0.328, the second row probability in Table 1. The probability of
group six outcome is equal to third row probability in Table 1.
Above processes indicate that both tree diagram and formula produce the same values and are useful to calculate the multi-category discrete probability distribution of samples drawn without replacement.
Above processes indicate that both tree diagram and formula produce the same values and are useful to calculate the multi-category discrete probability distribution of samples drawn without replacement.
No comments:
Post a Comment