Among 40 participants in a training, 18 were vegetarians
and 22 were non-vegetarians. 2 participants are selected at random one after another
without replacement of the name of the first selected participant. Calculate
the probability distribution of vegetarians in which
the order does not matter whether the vegetarians are sampled or not in the
first or second draw.
Counting the total and favorable numbers of outcomes
constituting the specified number of objects sampled without replacement from
the finite population is important to calculate the probability
of favorable events. Excel software is commonly available in the desktop or the
laptop and is an important means to calculate the discrete probability
distribution. I take an example from my Statistical Note 18, and will show how
calculate the discrete probability distribution without replacement using Excel
function. Refer to that note for conceptual clarity on using the tree diagram
and formula to calculate the Hypergeometric probability distribution.
EXCEL Function
Excel software has a ‘Formulas’ tab, which has ‘Insert
Function’ command under ‘Function Library’ group. Select ‘HYPGEOM.DIST’
function from the list of all function. The ‘HYPGEOM.DIST’ formula has five fields
‘Sample-s’, ‘Number_sample’, ‘Population_s’, ‘Number_pop’ and Cumulative’ as
shown in Diagram 1.
Diagram 1: Selection of ‘HYPGEOM.DIST’ function in Excel Data Worksheet
The field ‘Sample_s’
takes the number of successes in trials. In this example, this field takes the
value from zero to two vegetarians as indicated from B2 to B4 cells in Diagram
1. Because, in the sample of two participants there could be no vegetarian, one
vegetarian or two vegetarians selected. In this calculation, one vegetarian in
two independent draws of participants has been used as shown in the cell B3 in
the table as well as an argument box in Diagram 1.
The field ‘Number_sample’ is the sample size. In
this example, two participants were sampled without replacement and thus two
was entered into this field of the function arguments in Diagram 1.
The field ‘Population_s’ is the number of successes
in the population. This example has 18 vegetarians out of 40 participants. Thus,
18 was entered into the field of the function argument box in Diagram 1.
The field ‘Number_pop’ is the population
size. This example has a total of 40
participants. Thus, 40 was entered into this field of the function argument in
the diagram 1.
The field ‘Cumulative’ is a logical value that
determines the form of the function. If ‘Cumulative’ is ‘FALSE’, ‘HYPGEOM.DIST’
calculates the probability mass function (PMF), which gives the probability associated
with the value assigned to the field ‘Sample_s’ as the number of successes. Thus,
‘FALSE’ was entered into the function argument box in Diagram 1.
Fixing all five fields in the function arguments, ‘HYPGEOM.DIST’
function calculated the PMF equal to 0.507. It
means that there is 50.7 percent chance that one of two participants sampled
without replacement will be a vegetarian. This is equal to the value manually calculated in my Statistical
Note 18.
The cumulative distribution function (CDF) is another
important probability calculation. If the field ‘Cumulative’ takes the
value TRUE, the ‘HYPGEOM.DIST’ results in the CDF, which gives the probability
up to the number of successes mentioned in the field ‘Sample_s’. In this
example, the probability upto 1 vegetarian selected is the sum of probability
of no vegetarian or non-vegetarian selected in both samples and the probability
of one vegetarian selected between two participants sampled. The logical value
‘TRUE’ is used an shown in both the function argument box and the dataset with
the cell value ‘N53’ that gives the CDF equal to 0.8038 as shown in Diagram 2.
Diagram 2: ‘HYPGEOM.DIST’
Function Arguments Using Dataset in Excel Worksheet and using ‘TRUE’ logical value in the field ‘Cumulative’
It
means that there is 80.3 percent chance that up to one vegetarian will be
selected in two participants sampled without replacement. This is equal to the
one manually calculated in the Statistical Note 18.
PMFs
and PDFs can be undertaken for all number of successes, that is the number of
two non-vegetarians (no vegetarian), one vegetarian and two vegetarians out of
two participants sampled without replacement as shown in tables and argument
functions in both diagrams 1 and 2.
Discussion in this note and my former Statistical Note
18 indicates that both manual and auto calculation produce the same values and are
useful to calculate the discrete probability distribution without replacement.
Thus, conceptual understanding is a backbone and automatization is efficient.
Thus, both are important knowledge and skill sets.
No comments:
Post a Comment