Draw five cards
without replacement from a deck of cards in an event and count the number of black
cards. Repeat the same process for seven events each constituting five cards.
Calculate the observed and theoretical discrete probability distributions of
number of black cards.
The observed probability distribution is based on the real-time data. The
theoretical probability distribution is based on an ideal situation. Using the
observed data is important to understand the theory. The main
objective of this note is to develop understanding of concepts on probability and
two-category or binary probability distribution without replacement using a simple
experiment.
Drawing of some cards without replacement from a deck of cards is an
example of the two-category discrete probability distribution of sampling without
replacement. Refer to my earlier Statistical Notes also for clarity on
calculating the two-category discrete probability distribution of sampling
without replacement using tree diagram, formula and Excel software function.
In this note, first I present the
observed data and then present the probability and two-category discrete
probability concepts using the event data. This note tries to clarify the
concept two-category discrete probability distribution based on the observed
data. Former notes first tried to clarify the theory and then discussed the
observation. Unlike, this note is other way round which first discusses on the
observed data using the tree diagram, and then clarifies the theoretical
distribution. This is also because to analyze and interpret meaningfully the
observed data based on the theory.
Observed Data
I drew five cards without
replacement in an event (E) and the same process was repeated for seven events.
Table 1 presents the outcome of drawing five cards without replacement in each
of seven events. Black and red cards were coded B and R respectively for
symbolic representation. Besides, a cell with black card is shaded black and
the cell with red card is shared red. I
will discuss more on this table in following sections.
Table 1: Outcomes in drawing five
cards without replacement in each of seven events (Black card=B and Red card=R)
Queries
Several questions may arise looking at the outcomes data in
Table 1. For example,
·
How many unique outcomes each constituting five
tosses are there in Table 1?
·
Why are some outcomes in the table same and
others different?
·
Is there any pattern of outcomes in drawing five
cards without replacement?
·
How many unique outcomes will there be theoretically
of black and red cards of five cards drawn without replacement?
·
How many groups of outcomes are there in the
table of observed data?
·
Why were there outcomes with only two to three black
cards of five cards drawn without replacement? Why not less or more than those
number of black cards?
·
How many different groups of outcomes will there
be theoretically having five black cards to no black card in drawing five cards
without replacement?
·
What is the probability of the first event E1 (B,R,R,B
and R) in Table 1 that has black and red cards in exactly this order?
·
What is the probability of two black cards out
of five cards drawn without replacement in which the order does not matter
whether a black or a red card occurs in which draw out of five cards?
·
Looking at Table 1, what will be the observed
discrete probability distribution of number of black cards?
·
What will be the theoretical probability
distribution of number of black cards in drawing five cards without replacement?
·
How different will be the observed from the
theoretical discrete probability distribution of number of black cards in drawing
five cards without replacement?
Response
These questions can be answered
using different tools – Tree Diagram, Binomial Expansion, Binomial Distribution
function and Hypergeometric function. Look at specific statistical notes to get
answers to these questions. Below are responses to the queries:
Questions: How many unique
outcomes each constituting five tosses are there in Table 1? Why are some event
outcomes in the table same and others different? Is there any pattern of
outcomes in drawing five cards without replacement?
Drawing a card is unbiased such
that both black and red cards are equally likely to occur with the probability
of half for each of black and red cards. Drawing a card is a random experiment
in which any of black or red card is likely to be drawn. Thus, there are five
unique outcomes each constituting five cards. First and seven outcomes (E1 and
E7) are same, third and fifth outcomes (E3 and E5) are same. One outcome has three consecutive occurrences of red cards
of five cards drawn (E6). Three outcomes have two occurrences of red cards of
five cards (E1, E2 and E7). Black cards have not occurred consecutively in
any of the seven events. One outcome has occurrences of red and black cards
alternatively (E5). These indicate that there is no pattern of occurrence of
black and red cards.
Question: How many unique
outcomes will there be theoretically of black and red cards of five cards drawn
without replacement?
Refer to my statistical notes 6 on
the total number of possible outcomes of sampling without replacement. Each
unique outcome will be different than others based on the black or red card in
a certain draw. The total number of possible outcomes is calculated by using
the formula ‘k to the power r’ or ‘k to the rth power’ or ‘kr’,
where ‘k’ is the number of possible
outcomes in an experiment or trail and ‘r’ is the number of times an
experiment is conducted with replacement or the number of sampling units drawn
without replacement. In this example, ‘n’ is two and ‘r’ is five so that the total
number of possible outcomes is calculated by multiplying two possibilities (black
or red card) in each of drawing five cards without replacement. This is
calculated as 2 X 2 X 2 X 2 X 2 equal to 32 represented by the ‘two to the
power five’ or ‘two to the fifth power’, denoted by 25. Here, the
number of outcomes remains same as that with replacement (see Statistical Note
34). This is clearly seen on the tree diagram 1 as well. Since the cards were
drawn without replacement, the principle of conditional probability applies to the
second through subsequent cards drawn without replacement. The outcomes with
the number of black cards ranging from five black cards, denoted by ‘5B’ to
zero black card (all five red cards), denoted by ‘0B’ are indicated by
different colors on the third block from right in tree diagram 1. Besides,
outcomes of seven events of drawing five cards without replacement listed in
Table 1 are out of 32 outcomes that are shown with the respective outcome
numbers E1 to E7 with different colors at the right most part of the tree
diagram 1.
Questions: How many groups of
outcomes are there in the table of observed data? Why were there
outcomes with only two to three black cards in drawing five cards without
replacement? Why not less or more than those number of black cards?
There are two groups of
outcomes, with two and three black cards in drawing five cards without
replacement. Some events have same number of black cards. Five events (E1, E2,
E4, E6 and E7) each with five cards have two black cards and two events (E3 and
E5) have three black cards. If such events are repeated for other multiple
times, those events could have other number of black cards ranging from zero to
all five black cards in drawing five cards without replacement. Thus, there is
no guarantee that a specified number of black cards will occur in any number of
cards drawn.
Diagram 1: Tree Diagram
Showing Outcomes in Drawing Five Cards Without Replacement from a Deck of Cards
Question: How many groups of outcomes will there be theoretically
having five black cards to no black card in drawing five cards without
replacement?
Refer to my statistical note 15
on the grouping of unique outcomes in which the order does not matter. The
possible number of outcomes groups is based on the number of black or red card
in a draw of a card irrespective of the order of the occurrence of a black card.
This is calculated using the formula C(k+r-1,r)=(k+r-1)!/ (k-1)!r!, where ‘C’
refers to the combination, ‘k’ is the number of possible outcomes in an experiment or trail and ‘r’ is
the number of times an experiment or a trail is conducted. In this example, ‘k’
is two and ‘r’ is five so that the number of groups of possible outcomes is
calculated to be 6! divided by 5!, equal to 6.
The grouping of outcomes with the
number of black cards are shown in tree diagram 1. There are six groups of
outcomes, ranging from G1 to G6. G1 has only one outcome with five black cards in
drawing five cards without replacement, G2 has five outcomes with four black
cards, G3 has 10 outcomes with three black cards, G4 has ten outcomes with two black
cards, G5 has five outcomes with one black card and G6 has one outcome with no black
card, means all red cards in drawing five cards without replacement.
This grouping can be shown using
the Binomial Expansion formula. Let ‘B’ be the black card and ‘R’ be the
red card. Since, five cards are drawn without replacement, a power five of sum
of ‘B’ and ‘R’ is used for the Binomial expansion. The expansion of (B+R)5
is expressed as:
(B+R)5 = B5+5B4R+10B3R2+10B2R3+5BR4+R5
‘B5’ means there is
one outcome G1 having five black cards in drawing five cards from a deck, in
the same way ‘5B4R’ means there are five outcomes having four black
cards and one red card, ‘10B3R2’ means 10 outcomes with
three black cards and two red cards, ‘10B2R3’ means 10
outcomes with two black cards and three red cards, ‘5BR4’ means five
outcomes with one black card and four red cards, and ‘R5’ means one
outcome constituting five red cards in drawing five cards from a deck.
The number of outcomes
in a certain group of outcomes or the Binomial coefficient can be identified
using Pascal’s Triangle as discussed in my Statistical Note 16, as shown in
Diagram 2.
Diagram 2: Number of cards
drawn from a deck and Binomial Coefficient using Pascal’s Triangle
With the increase in the number
of cards drawn, say 20 cards drawn, it is difficult to draw the tree diagram as
well as to write the Binomial expansion of (B+R)20, particularly the
coefficients of each group of outcomes. In such a case, Binomial Distribution function is used to identify the coefficients.
The formula is: C(n,x)BxRn-x, where ‘n’ is the number of trails and
‘B’ is the black card and ‘R’ is the red card. Example, I use Binomial
distribution formula to calculate the number of outcomes in the group
constituting three black and two red cards in drawing five cards without
replacement from a deck. It is C(5,3)B3R2, equivalent to
10 B3R2 which is same as the third group in the above
Binomial expansion.
Using the Binomial distribution function, as discussed in my Statistical
Note 16, the expansion looks:
(B+R)n = C(n,0)Bn+C(n,1)Bn-1R+C(n,2)Bn-2R2+C(n,3)Bn-3R3+
C(n,x)Bn-xRx+……+ C(n,n)Rn
This expression can be used to calculate the number of outcomes in a
certain group of black cards and ultimately the total number of outcomes for
the given number of events or experiments without replacement.
The same expansion can be used to identify the number of groups of
outcomes that is Binomial coefficient for Hypergeometric distribution also. But
the main change lies in calculating the probability of the group of outcomes in
which the probability of the second and subsequent outcomes within a group of
outcomes increases with the decrease in the denominator in the calculation of
the probability.
Question: What is the
probability of the first event E1 (B,R,R,B and R) in Table 1 that
has black and red cards in exactly this order?
The joint probability is calculated by multiplying the marginal
probability of the first card and conditional probabilities of the remaining
cards drawn without replacement. There is only one outcome that has black cards
in the first and third draws and red cards in second, third and fifth draws of five
cards without replacement from a deck of cards. Thus, probability of an event
E1 (B,R,R,B and R) is the product
of the marginal probability of the first black card and conditional
probabilities of the remaining four cards (R,R,B and R). Using these
probability values from the tree diagram 1, P(B,R,R,B and R) is the product of
26/52, 26/51, 25/50, 25/49 and 25/48, equal to 0.033867.
Question: What is the
probability of two black cards out of five cards drawn without replacement in
which the order does not matter whether a black or red card occurs in which draw
out of five cards?
Looking at the tree diagram 1, there are 10 outcomes constituting two
black cards under group G2. They are – first outcome (B,B,R,R and R);
second outcome (B,R,B,R and R) ; third outcome (B,R,R,B and R) ; fourth outcome
P(B,R,R,R and B) ; fifth outcome (R,B,B,R and R) ; sixth outcome (R,B,R,B and R)
; seventh outcome (R,B,R,R and B) ; eighth outcome (R,R,B,B and R) ; ninth outcome
(R,R,B,R and B) ; and tenth outcome (R,R,R,B and B). Adding the probability of
all these ten outcomes, the probability of two black cards, P(2B), out of five
cards drawn without replacement is equal to the sum of conditional
probabilities of these ten outcomes. Thus,
P(2B)= P(B∩B∩R∩R∩R)+P(B∩R∩B∩R∩R)+P(B∩R∩R∩B∩R)+P(B∩R∩R∩R∩B)+P(R∩B∩B∩R∩R)+P(R∩B∩R∩B∩R)+P(R∩B∩R∩R∩B)+P(R∩R∩B∩B∩R)+ P(R∩R∩B∩R∩B)+ P(R∩R∩R∩B∩B)
P(2B) = (26x25x25x24x23)/(52x51x50x49x48)+(26x26x26x26x25)/(52x51x50x49x48)+
(26x26x25x25x25)/(52x51x50x49x48)+(26x26x25x24x24)/(52x51x50x49x48)+
(26x26x25x25x24)/(52x51x50x49x48)+(26x26x26x26x26)/(52x51x50x49x48)+
(26x26x26x25x25)/(52x51x50x49x48)+(26x25x25x24x24)/(52x51x50x49x48)+
(26x25x25x25x25)/(52x51x50x49x48)+(26x25x24x24x23)/(52x51x50x49x48)
P(2B) = (8,970,000+11,424,400+10,562,500+9,734,400+10,140,000+11,881,376+10,985,000+9,360,000+10,156,250+8,611,200)/(52x51x50x49x48)
=101,825,126/(52x51x50x49x48) = 0.326493
The probability of two black cards out of five cards drawn without
replacement can be calculated using Hypergeometric distribution formula.
Let X be a random variable of interest (number of black
cards) that takes the value two as the number of black cards in the sample of five
cards drawn without replacement, denoted by ‘x’. The probability distribution
of X depends on the parameters, ‘n’, ‘M’ and ‘N’, and is given by the
expression
P(X=x) = h(x;n,M,N) = [C(M,x) X C(N-M,n-x)]/C(N,n)
In this example, n=5, M=26, N=52 and ‘x’ takes the value
2. Putting these values in the above formula, one gets
P(X=2) = [C(26,2) X C(26,3)/C(52,5)] = (26! X 26! X 47! X
5!) / (24! X 2! X 23! X 3! X 52!) = 0.325130
This value is
equal to the one calculated above. In the same way, the probability for other
number of black cards in five cards drawn without replacement can be calculated.
Excel software can also be used to calculate the probability using Hypergeometric
formula and Hypergeometric function.
Question: Looking at Table 1, what will be the
observed discrete probability distribution of number of black cards?
To summarize, the number of black
cards out of five cards drawn without replacement in seven events ranged from two
to three (Table 1). Two black cards were drawn five times in five of seven events
of five cards (E1, E2, E4, E6 and E7) and three black cards were drawn two times
(E3 and E5). Thus, drawing of two black cards is most likely to occur, five out
of seven times with probability P(X=2)= 0.714285, highlighted yellow in Table 2.
Table 2: Number of Black Cards Out
of Five Cards Drawn Without Replacement in Each of Seven Events and Observed
Probability Distribution of Number of Black Cards
Question: What will be the theoretical probability
distribution of number of black cards in drawing five cards without replacement?
The probability of all groups of outcomes can be calculated as
calculated in the former section to present the probability distribution in
Table 3. Besides, Hypergeometric distribution formula and function in Excel is
also used to calculate the two-category theoretical probability distribution
without replacement (Table 4). Refer to my Statistical Notes 31 and 32 that discuss
on the Theoretical Two-Category Discrete Probability Distribution calculation.
Occurrence of two or three black cards in drawing five cards without
replacement have highest probability and are thus, highly likely to occur. These
are highlighted yellow. The likelihood decreases towards both sides from two or
three black cards. Two extreme number of black cards, zero and five black cards,
have the least chance of occurrence.
Table 3: Number of Black
Cards, Number of Outcome Groups and Theoretical Probability Distribution of
Number of Black Cards
Table 4: Number of Black Cards, Number of Red Cards,
Hypergeometric Distribution Formula and Function Used to Calculate the Theoretical
Probability Distribution of Number of Black Cards of Five Cards Drawn Without
Replacement
Question: How different
will be the observed from the theoretical discrete probability distribution of
number of black cards in drawing five cards without replacement?
Chart 1 compares the
observed and theoretical two category discrete probability distribution of black
cards in drawing five cards without replacement.
This clearly shows the bell-shaped curve, the
symmetric line chart of theoretical probability distribution and how different
the observed distribution and charts are. Unlike, the distribution of observed is positively
skewed.
Conclusion
Tree diagram, Binomial
expansion, Binomial distribution function and Hypergeometric function are
important tools to calculate the number of outcomes and the probability of samples
drawn without replacement. The observed two-category probability distribution
differs from the theoretical distribution. The observed data could differ from
one event to another because of non-uniformity in the condition in which a card
is drawn without replacement.