Family Tree

Family Tree

About Me

My photo
Kathmandu, Bagmati Zone, Nepal
I am Basan Shrestha from Kathmandu, Nepal. I use the term 'BASAN' as 'Balancing Actions for Sustainable Agriculture and Natural Resources'. I am a Design, Monitoring & Evaluation professional. I hold 1) MSc in Regional and Rural Development Planning, Asian Institute of Technology, Thailand, 2002; 2) MSc in Statistics, Tribhuvan University (TU), Kathmandu, Nepal, 1995; and 3) MA in Sociology, TU, 1997. I have more than 10 years of professional experience in socio-economic research, monitoring and documentation on agricultural and natural resource management. I had worked in Lumle Agricultural Research Centre, western Nepal from Nov. 1997 to Dec. 2000; CARE Nepal, mid-western Nepal from Mar. 2003 to June 2006 and WTLCP in far-western Nepal from June 2006 to Jan. 2011, Training Institute for Technical Instruction (TITI) from July to Sep 2011, UN Women Nepal from Sep to Dec 2011 and Mercy Corps Nepal from 24 Jan 2012 to 14 August 2016 and CAMRIS International in Nepal commencing 1 February 2017. I have published articles to my credit.

Sunday, July 8, 2018

Multi-Category Discrete Probability Distribution of Sampling Without Replacement and Tree Diagram and Formula, Statistical Note 20


Among 40 participants in a training, 10 were youths (less than 30 years), 20 were adults (30 to 59 years) and remaining 10 were senior citizens (60 or more years). 2 participants are selected at random one after another without replacement of the name of the first selected participant. Calculate the probability distribution of youths in which the order does not matter whether the youths are sampled or not in the first or second draw.

Counting the total and favorable numbers of outcomes constituting the specified number of objects sampled without replacement from the finite population is important to calculate the probability of favorable events. Tree diagram is important to visualize, count the number of outcomes and calculate the probability. As the number of times the objects are selected increases, the diagram becomes complicated and it will be less possible to show all outcomes in the diagram. Thus, Formula is used for counting the outcomes and calculating the discrete probability distribution. I discuss how to visualize and calculate the multi-category discrete probability distribution without replacement using the tree diagram and formula.

Tree Diagram

This example has three categories so that the selected participant is one of youth (Y), adult (A) or senior citizen (S) in any selection. Two participants will be selected consecutively without replacement, referred to as sampling without replacement. The probability of each outcome is presented in Diagram 1. For detailed discussion on how they are derived please refer to my statistical note 6.

















Diagram 1: First and second steps showing marginal and conditional probabilities (multi-category discrete probability distribution of sampling without replacement)

Total of nine outcomes are right most cells in Diagram 1. One outcome (YY) in red cell has two youths sampled in both selections without replacement.   Two outcomes (YA and AY) in dark red cells and two outcomes (YS and SY) in dark green cells have a youth and another participant sampled in two selections. One outcome (AA) in purple cell, two outcomes (AS and SA) in grey cells and one outcome (SS) in yellow cell have no youth sampled in two selections.

The joint probability that both the first and second selected participants are youths, denoted by P(Y∩Y) or P(YY), is the product of P(Y) and P(Y/Y) which is equal to 0.036. Following the same process, other joint probabilities are calculated.

Let X be a random variable that takes the value 0, 1 or 2 as the number of youths sampled without replacement in two selections. The Probability Mass Functions are calculated and plotted in Table 1 as below:

P(X=0) = P(AA)+P(AS)+P(SA)+P(SS) = [(25 X 24) + 2(25 X 7) + (7 X 6)] / (40 X 39) = (600+350+42) / (40 X 39) = 992 / 1560 = 0.636

P(X=1) = P(YA)+P(YS)+P(AY)+P(SY) = [2(8 X 25) + 2(8 X 7)] / (40 X 39) = (400+112) / (40 X 39) = 512 / 1560 = 0.328

P(X=2) = P(YY) = 56/ 1560 = 0.036

Table 1: Multi-Category Discrete probability distribution of youths sampled without replacement






The probability distribution of youth shows that there is 63.6 percent chance that youth will not be sampled among two participants selected. There is 32.8 percent chance that one of two participants selected will be a youth and there is 3.6 percent chance that both participants will be youths. If the probabilities are added, there is 96.4 percent chance that up to one youth will be selected. There is cent percent chance of sampling two or less number of youths in the selection of two participants without replacement.


This example has three characteristic features. First, the example has a finite population of 40 participants, denoted by ‘N’. Second, there are more than two possible categories that are mutually exclusive in each trial or experiment. Third, the second trial is dependent on the outcome of the first trail, referred to as sampling without replacement. Fourth, each category can occur ‘0’ to ‘n’ times in ‘n’ trails.

Let Xi be a random variable of interest that takes one of ‘0’ to ‘n’, denoted by ‘xi’ in the sampling without replacement. ‘i’ takes the value from 1 to k, the number of categories of response. The probability distribution of Xi is given by the expression
P(X1=n1, X2=n1, X3=n3… Xk=nk) = [C(N1,n1) X C(N2,n2) X ……. C(Nk,nk)]/C(N,n)

where,
N is the population size, 
Nk is the kth category size in the population, 
n is the total number of draws or sample, 
nk is the number of draws from the kth category,
C(N,n) is the combination of sample of size n drawn from the population of size N, also referred to as the Binomial coefficient. For clarity on combination and Binomial coefficient refer to my statistical notes from 10 to 16.

This distribution is referred to as Multivariate Hypergeometric distribution.

In this example, N is equal to 40 and n is equal to 2. Each of Y, A and S occurs 0 to 2 times and the outcomes are grouped and Probability Mass Functions (PMF) are calculated as below:

Group One Outcome:
One outcome (AA) has adults sampled in both selections. The PMF is calculated as:
P(X1=0, X2=2, X3=0) = [C(8,0) X C(25,2) X C(7,0)]/C(40,2) = (25 X 24) / (40 X 39) = 0.385

Group Two Outcome:
One outcome (SS) has senior citizens sampled in both selections. The PMF is calculated as:
P(X1=0, X2=0, X3=2) = [C(8,0) X C(25,0) X C(7,2)]/C(40,2) = (7 X 6) / (40 X 39) = 0.027

Group Three Outcomes:
Two outcomes (AS and SA) have an adult and a senior citizen sampled in two selections. Taking the outcome constituting adult participant in the first selection and senior citizen in the second selection, the PMF is calculated as:
P(X1=0, X2=1, X3=1) = [C(8,0) X C(25,1) X C(7,1)]/C(40,2) = (2 X 7 X 25) / (40 X 39) = 0.224.

Group Four Outcomes:
Two outcomes (YA and AY) have a youth and an adult sampled in two selections. Taking the outcome constituting a youth in the first selection and an adult in the second selection, the PMF is calculated as:
P(X1=1, X2=1, X3=0) = [C(8,1) X C(25,1) X C(7,0)]/C(40,2) = (2 X 8 X 25) / (40 X 39) = 0.256.

Group Five Outcomes:
Two outcomes (YS and SY) have a youth and a senior sampled in two selections. Taking the outcome constituting a youth in the first selection and a senior in the second selection, the PMF is calculated as:
P(X1=1, X2=0, X3=1) = [C(8,1) X C(25,0) X C(7,1)]/C(40,2) = (2 X 8 X 7) / (40 X 39) = 0.072.

Group Six Outcomes:
One outcome (YY) has youths sampled in both selections. The PMF is calculated as:
P(X1=2, X2=0, X3=0) = [C(8,2) X C(25,0) X C(7,0)]/C(40,2) = (8 X 7) / (40 X 39) = 0.036
  
The probabilities of group one to three outcomes are added to calculate the probability that  youths are not sampled in both selection, which is equal to 0.636, the first row probability in Table 1. The probabilities of group four and five outcomes are added to calculate the probability that  one youth is sampled in two selections, which is equal to 0.328, the second row probability in Table 1. The probability of group six outcome is equal to third row probability in Table 1. 

Above processes indicate that both tree diagram and formula produce the same values and are useful to calculate the multi-category discrete probability distribution of samples drawn without replacement.

No comments:

Post a Comment