Among 40 participants in a training, 8 were youths (less
than 30 years), 25 were adults (30 to 59 years) and remaining 7 were senior
citizens (60 or more years). 2 participants are selected at random one after
another without replacement of the name of the first selected participant.
Calculate the probability distribution of youths in which the order does not matter
whether the youths are sampled or not in the first or second draw.
Counting
the total and favorable numbers of outcomes constituting the specified number
of objects sampled without replacement from the finite population is important
to calculate the probability of favorable events. Excel software is commonly
available in the desktop or the laptop and is an important means to calculate
the discrete probability distribution. I take an example from my Statistical
Note 20, and will show how calculate the multicategory discrete probability
distribution without replacement using Excel function. Refer to that note for
conceptual clarity on using the tree diagram and formula to calculate the Multivariate
Hypergeometric probability distribution.
EXCEL
Software
EXCEL
software does not have an inbuilt function to calculate Multivariate
Hypergeometric probability. One inbuilt EXCEL function is used to calculate the
Multivariate Hypergeometric probability – COMBIN function.
First,
I will prepare the dataset for an outcome constituting one youth and one adult in
two participants sampled without replacement as shown in Diagram 1. It is the
case of group two outcomes discussed under Formula section in the Statistical
Note 20.
Diagram
1: Dataset showing the number of participants
by age category and two participants sampled without replacement (one youth and
one adult)
Referring
to the Multivariate Hypergeometric distribution formula, the COMBIN function is
used for each category of participants in the population and in the sample drawn
without replacement as shown in Diagram 2. The COMBIN function has two fields -
‘Number’ and ‘Number_choosen’. In Diagram 2, ‘Number’ takes the value as the number
of items by certain category in the population. In this example, eight are
youths among 40 participants, and one of two participants sampled is youth as shown
in cells B2 and B6 respectively in Diagram 2. The same formula is applied to
all categories of participants and total participants.
Diagram
2: Application of COMBIN Function in Excel
The
PRODUCT function is applied to calculate the product of a series of numbers.
The function has several fields – ‘Number1’, ‘Number2’. Referring to the
Multivariate Hypergeometric distribution formula, in this example, the ‘Number1’
field takes a series of values from cells B10 to B12 on COMBIN values as shown
in Diagram 3.
Diagram
3: Application of PRODUCT Function in Excel
The probability
of an outcome, an occurrence of one youth and another adult in participants
sampled without replacement among 40 participants is equal to P(1,1,0)
discussed under Group Four Outcomes under Formula section of the Statistical
Note 20. This value is calculated by dividing the value in cell B14 by the
value in B13, which is equal to 0.2564 as shown in the function argument box in
Diagram 3.
Similar
process is followed to calculate the probability of each group of outcomes,
mentioned in the Statistical Note 20. Finally, adding
the probability of each group of outcomes will give the Multivariate
Hypergeometric probability distribution.
Discussion in this
note and my former Statistical Note 20 indicate that
both manual and auto calculation produce the same values and are useful to
calculate the multicategory discrete probability distribution without
replacement, Multivariate Hypergeometric Distribution. Conceptual understanding
is a backbone and automatization is efficient. Thus, both are important
knowledge and skill sets.
No comments:
Post a Comment