Family Tree

Family Tree

About Me

My photo
Kathmandu, Bagmati Zone, Nepal
I am Basan Shrestha from Kathmandu, Nepal. I use the term 'BASAN' as 'Balancing Actions for Sustainable Agriculture and Natural Resources'. I am a Design, Monitoring & Evaluation professional. I hold 1) MSc in Regional and Rural Development Planning, Asian Institute of Technology, Thailand, 2002; 2) MSc in Statistics, Tribhuvan University (TU), Kathmandu, Nepal, 1995; and 3) MA in Sociology, TU, 1997. I have more than 10 years of professional experience in socio-economic research, monitoring and documentation on agricultural and natural resource management. I had worked in Lumle Agricultural Research Centre, western Nepal from Nov. 1997 to Dec. 2000; CARE Nepal, mid-western Nepal from Mar. 2003 to June 2006 and WTLCP in far-western Nepal from June 2006 to Jan. 2011, Training Institute for Technical Instruction (TITI) from July to Sep 2011, UN Women Nepal from Sep to Dec 2011 and Mercy Corps Nepal from 24 Jan 2012 to 14 August 2016 and CAMRIS International in Nepal commencing 1 February 2017. I have published articles to my credit.

Saturday, September 15, 2018

Two-category Discrete Probability Distribution of Sampling Without Replacement, Observation and Theory, Statistical Note 35

Draw five cards without replacement from a deck of cards in an event and count the number of black cards. Repeat the same process for seven events each constituting five cards. Calculate the observed and theoretical discrete probability distributions of number of black cards.

The observed probability distribution is based on the real-time data. The theoretical probability distribution is based on an ideal situation. Using the observed data is important to understand the theory. The main objective of this note is to develop understanding of concepts on probability and two-category or binary probability distribution without replacement using a simple experiment.

Drawing of some cards without replacement from a deck of cards is an example of the two-category discrete probability distribution of sampling without replacement. Refer to my earlier Statistical Notes also for clarity on calculating the two-category discrete probability distribution of sampling without replacement using tree diagram, formula and Excel software function.

In this note, first I present the observed data and then present the probability and two-category discrete probability concepts using the event data. This note tries to clarify the concept two-category discrete probability distribution based on the observed data. Former notes first tried to clarify the theory and then discussed the observation. Unlike, this note is other way round which first discusses on the observed data using the tree diagram, and then clarifies the theoretical distribution. This is also because to analyze and interpret meaningfully the observed data based on the theory.

Observed Data

I drew five cards without replacement in an event (E) and the same process was repeated for seven events. Table 1 presents the outcome of drawing five cards without replacement in each of seven events. Black and red cards were coded B and R respectively for symbolic representation. Besides, a cell with black card is shaded black and the cell with red card is shared red.  I will discuss more on this table in following sections.

Table 1: Outcomes in drawing five cards without replacement in each of seven events (Black card=B and Red card=R)







Queries
Several questions may arise looking at the outcomes data in Table 1. For example, 
·         How many unique outcomes each constituting five tosses are there in Table 1?
·         Why are some outcomes in the table same and others different?
·         Is there any pattern of outcomes in drawing five cards without replacement?
·         How many unique outcomes will there be theoretically of black and red cards of five cards drawn without replacement?
·         How many groups of outcomes are there in the table of observed data?
·         Why were there outcomes with only two to three black cards of five cards drawn without replacement? Why not less or more than those number of black cards?
·         How many different groups of outcomes will there be theoretically having five black cards to no black card in drawing five cards without replacement?
·         What is the probability of the first event E1 (B,R,R,B and R) in Table 1 that has black and red cards in exactly this order?
·         What is the probability of two black cards out of five cards drawn without replacement in which the order does not matter whether a black or a red card occurs in which draw out of five cards?
·         Looking at Table 1, what will be the observed discrete probability distribution of number of black cards?
·         What will be the theoretical probability distribution of number of black cards in drawing five cards without replacement?
·         How different will be the observed from the theoretical discrete probability distribution of number of black cards in drawing five cards without replacement?

Response

These questions can be answered using different tools – Tree Diagram, Binomial Expansion, Binomial Distribution function and Hypergeometric function. Look at specific statistical notes to get answers to these questions. Below are responses to the queries:

Questions: How many unique outcomes each constituting five tosses are there in Table 1? Why are some event outcomes in the table same and others different? Is there any pattern of outcomes in drawing five cards without replacement?  

Drawing a card is unbiased such that both black and red cards are equally likely to occur with the probability of half for each of black and red cards. Drawing a card is a random experiment in which any of black or red card is likely to be drawn. Thus, there are five unique outcomes each constituting five cards. First and seven outcomes (E1 and E7) are same, third and fifth outcomes (E3 and E5) are same. One outcome has three consecutive occurrences of red cards of five cards drawn (E6). Three outcomes have two occurrences of red cards of five cards (E1, E2 and E7). Black cards have not occurred consecutively in any of the seven events. One outcome has occurrences of red and black cards alternatively (E5). These indicate that there is no pattern of occurrence of black and red cards.

Question: How many unique outcomes will there be theoretically of black and red cards of five cards drawn without replacement?

Refer to my statistical notes 6 on the total number of possible outcomes of sampling without replacement. Each unique outcome will be different than others based on the black or red card in a certain draw. The total number of possible outcomes is calculated by using the formula ‘k to the power r’ or ‘k to the rth power’ or ‘kr’, where ‘k’ is the number of possible  outcomes in an experiment or trail and ‘r’ is the number of times an experiment is conducted with replacement or the number of sampling units drawn without replacement. In this example, ‘n’ is two and ‘r’ is five so that the total number of possible outcomes is calculated by multiplying two possibilities (black or red card) in each of drawing five cards without replacement. This is calculated as 2 X 2 X 2 X 2 X 2 equal to 32 represented by the ‘two to the power five’ or ‘two to the fifth power’, denoted by 25. Here, the number of outcomes remains same as that with replacement (see Statistical Note 34). This is clearly seen on the tree diagram 1 as well. Since the cards were drawn without replacement, the principle of conditional probability applies to the second through subsequent cards drawn without replacement. The outcomes with the number of black cards ranging from five black cards, denoted by ‘5B’ to zero black card (all five red cards), denoted by ‘0B’ are indicated by different colors on the third block from right in tree diagram 1. Besides, outcomes of seven events of drawing five cards without replacement listed in Table 1 are out of 32 outcomes that are shown with the respective outcome numbers E1 to E7 with different colors at the right most part of the tree diagram 1.

Questions: How many groups of outcomes are there in the table of observed data? Why were there outcomes with only two to three black cards in drawing five cards without replacement? Why not less or more than those number of black cards?

There are two groups of outcomes, with two and three black cards in drawing five cards without replacement. Some events have same number of black cards. Five events (E1, E2, E4, E6 and E7) each with five cards have two black cards and two events (E3 and E5) have three black cards. If such events are repeated for other multiple times, those events could have other number of black cards ranging from zero to all five black cards in drawing five cards without replacement. Thus, there is no guarantee that a specified number of black cards will occur in any number of cards drawn.

Diagram 1: Tree Diagram Showing Outcomes in Drawing Five Cards Without Replacement from a Deck of Cards
























Question: How many groups of outcomes will there be theoretically having five black cards to no black card in drawing five cards without replacement?

Refer to my statistical note 15 on the grouping of unique outcomes in which the order does not matter. The possible number of outcomes groups is based on the number of black or red card in a draw of a card irrespective of the order of the occurrence of a black card. This is calculated using the formula C(k+r-1,r)=(k+r-1)!/ (k-1)!r!, where ‘C’ refers to the combination, ‘k’ is the number of possible  outcomes in an experiment or trail and ‘r’ is the number of times an experiment or a trail is conducted. In this example, ‘k’ is two and ‘r’ is five so that the number of groups of possible outcomes is calculated to be 6! divided by 5!, equal to 6.

The grouping of outcomes with the number of black cards are shown in tree diagram 1. There are six groups of outcomes, ranging from G1 to G6. G1 has only one outcome with five black cards in drawing five cards without replacement, G2 has five outcomes with four black cards, G3 has 10 outcomes with three black cards, G4 has ten outcomes with two black cards, G5 has five outcomes with one black card and G6 has one outcome with no black card, means all red cards in drawing five cards without replacement.

This grouping can be shown using the Binomial Expansion formula. Let ‘B’ be the black card and ‘R’ be the red card. Since, five cards are drawn without replacement, a power five of sum of ‘B’ and ‘R’ is used for the Binomial expansion. The expansion of (B+R)5 is expressed as:
(B+R)5 = B5+5B4R+10B3R2+10B2R3+5BR4+R5

‘B5’ means there is one outcome G1 having five black cards in drawing five cards from a deck, in the same way ‘5B4R’ means there are five outcomes having four black cards and one red card, ‘10B3R2’ means 10 outcomes with three black cards and two red cards, ‘10B2R3’ means 10 outcomes with two black cards and three red cards, ‘5BR4’ means five outcomes with one black card and four red cards, and ‘R5’ means one outcome constituting five red cards in drawing five cards from a deck.

The number of outcomes in a certain group of outcomes or the Binomial coefficient can be identified using Pascal’s Triangle as discussed in my Statistical Note 16, as shown in Diagram 2.


Diagram 2: Number of cards drawn from a deck and Binomial Coefficient using Pascal’s Triangle









With the increase in the number of cards drawn, say 20 cards drawn, it is difficult to draw the tree diagram as well as to write the Binomial expansion of (B+R)20, particularly the coefficients of each group of outcomes. In such a case, Binomial Distribution function is used to identify the coefficients. The formula is: C(n,x)BxRn-xwhere ‘n’ is the number of trails and ‘B’ is the black card and ‘R’ is the red card. Example, I use Binomial distribution formula to calculate the number of outcomes in the group constituting three black and two red cards in drawing five cards without replacement from a deck. It is C(5,3)B3R2, equivalent to 10 B3R2 which is same as the third group in the above Binomial expansion.  

Using the Binomial distribution function, as discussed in my Statistical Note 16, the expansion looks:
(B+R)n = C(n,0)Bn+C(n,1)Bn-1R+C(n,2)Bn-2R2+C(n,3)Bn-3R3+ C(n,x)Bn-xRx+……+ C(n,n)Rn
This expression can be used to calculate the number of outcomes in a certain group of black cards and ultimately the total number of outcomes for the given number of events or experiments without replacement.

The same expansion can be used to identify the number of groups of outcomes that is Binomial coefficient for Hypergeometric distribution also. But the main change lies in calculating the probability of the group of outcomes in which the probability of the second and subsequent outcomes within a group of outcomes increases with the decrease in the denominator in the calculation of the probability.

Question: What is the probability of the first event E1 (B,R,R,B and R) in Table 1 that has black and red cards in exactly this order?

The joint probability is calculated by multiplying the marginal probability of the first card and conditional probabilities of the remaining cards drawn without replacement. There is only one outcome that has black cards in the first and third draws and red cards in second, third and fifth draws of five cards without replacement from a deck of cards. Thus, probability of an event E1 (B,R,R,B and R) is the product of the marginal probability of the first black card and conditional probabilities of the remaining four cards (R,R,B and R). Using these probability values from the tree diagram 1, P(B,R,R,B and R) is the product of 26/52, 26/51, 25/50, 25/49 and 25/48, equal to 0.033867.

Question: What is the probability of two black cards out of five cards drawn without replacement in which the order does not matter whether a black or red card occurs in which draw out of five cards?

Looking at the tree diagram 1, there are 10 outcomes constituting two black cards under group G2. They are – first outcome (B,B,R,R and R); second outcome (B,R,B,R and R) ; third outcome (B,R,R,B and R) ; fourth outcome P(B,R,R,R and B) ; fifth outcome (R,B,B,R and R) ; sixth outcome (R,B,R,B and R) ; seventh outcome (R,B,R,R and B) ; eighth outcome (R,R,B,B and R) ; ninth outcome (R,R,B,R and B) ; and tenth outcome (R,R,R,B and B). Adding the probability of all these ten outcomes, the probability of two black cards, P(2B), out of five cards drawn without replacement is equal to the sum of conditional probabilities of these ten outcomes. Thus,

P(2B)= P(B∩B∩R∩R∩R)+P(B∩R∩B∩R∩R)+P(B∩R∩R∩B∩R)+P(B∩R∩R∩R∩B)+P(R∩B∩B∩R∩R)+P(R∩B∩R∩B∩R)+P(R∩B∩R∩R∩B)+P(R∩R∩B∩B∩R)+ P(R∩R∩B∩R∩B)+ P(R∩R∩R∩B∩B)

P(2B) =  (26x25x25x24x23)/(52x51x50x49x48)+(26x26x26x26x25)/(52x51x50x49x48)+ (26x26x25x25x25)/(52x51x50x49x48)+(26x26x25x24x24)/(52x51x50x49x48)+ (26x26x25x25x24)/(52x51x50x49x48)+(26x26x26x26x26)/(52x51x50x49x48)+ (26x26x26x25x25)/(52x51x50x49x48)+(26x25x25x24x24)/(52x51x50x49x48)+ (26x25x25x25x25)/(52x51x50x49x48)+(26x25x24x24x23)/(52x51x50x49x48)

P(2B) = (8,970,000+11,424,400+10,562,500+9,734,400+10,140,000+11,881,376+10,985,000+9,360,000+10,156,250+8,611,200)/(52x51x50x49x48) =101,825,126/(52x51x50x49x48) = 0.326493

The probability of two black cards out of five cards drawn without replacement can be calculated using Hypergeometric distribution formula.

Let X be a random variable of interest (number of black cards) that takes the value two as the number of black cards in the sample of five cards drawn without replacement, denoted by ‘x’. The probability distribution of X depends on the parameters, ‘n’, ‘M’ and ‘N’, and is given by the expression
P(X=x) = h(x;n,M,N) = [C(M,x) X C(N-M,n-x)]/C(N,n)

In this example, n=5, M=26, N=52 and ‘x’ takes the value 2. Putting these values in the above formula, one gets

P(X=2) = [C(26,2) X C(26,3)/C(52,5)] = (26! X 26! X 47! X 5!) / (24! X 2! X 23! X 3! X 52!) = 0.325130

This value is equal to the one calculated above. In the same way, the probability for other number of black cards in five cards drawn without replacement can be calculated. Excel software can also be used to calculate the probability using Hypergeometric formula and Hypergeometric function.

Question: Looking at Table 1, what will be the observed discrete probability distribution of number of black cards?

To summarize, the number of black cards out of five cards drawn without replacement in seven events ranged from two to three (Table 1). Two black cards were drawn five times in five of seven events of five cards (E1, E2, E4, E6 and E7) and three black cards were drawn two times (E3 and E5). Thus, drawing of two black cards is most likely to occur, five out of seven times with probability P(X=2)= 0.714285, highlighted yellow in Table 2.

Table 2: Number of Black Cards Out of Five Cards Drawn Without Replacement in Each of Seven Events and Observed Probability Distribution of Number of Black Cards





Question: What will be the theoretical probability distribution of number of black cards in drawing five cards without replacement?

The probability of all groups of outcomes can be calculated as calculated in the former section to present the probability distribution in Table 3. Besides, Hypergeometric distribution formula and function in Excel is also used to calculate the two-category theoretical probability distribution without replacement (Table 4). Refer to my Statistical Notes 31 and 32 that discuss on the Theoretical Two-Category Discrete Probability Distribution calculation.  

Occurrence of two or three black cards in drawing five cards without replacement have highest probability and are thus, highly likely to occur. These are highlighted yellow. The likelihood decreases towards both sides from two or three black cards. Two extreme number of black cards, zero and five black cards, have the least chance of occurrence.

Table 3: Number of Black Cards, Number of Outcome Groups and Theoretical Probability Distribution of Number of Black Cards
























Table 4: Number of Black Cards, Number of Red Cards, Hypergeometric Distribution Formula and Function Used to Calculate the Theoretical Probability Distribution of Number of Black Cards of Five Cards Drawn Without Replacement










Question: How different will be the observed from the theoretical discrete probability distribution of number of black cards in drawing five cards without replacement?

Chart 1 compares the observed and theoretical two category discrete probability distribution of black cards in drawing five cards without replacement.  













This clearly shows the bell-shaped curve, the symmetric line chart of theoretical probability distribution and how different the observed distribution and charts are.  Unlike, the distribution of observed is positively skewed.

Conclusion

Tree diagram, Binomial expansion, Binomial distribution function and Hypergeometric function are important tools to calculate the number of outcomes and the probability of samples drawn without replacement. The observed two-category probability distribution differs from the theoretical distribution. The observed data could differ from one event to another because of non-uniformity in the condition in which a card is drawn without replacement.

No comments:

Post a Comment