Family Tree

Family Tree

About Me

My photo
Kathmandu, Bagmati Zone, Nepal
I am Basan Shrestha from Kathmandu, Nepal. I use the term 'BASAN' as 'Balancing Actions for Sustainable Agriculture and Natural Resources'. I am a Design, Monitoring & Evaluation professional. I hold 1) MSc in Regional and Rural Development Planning, Asian Institute of Technology, Thailand, 2002; 2) MSc in Statistics, Tribhuvan University (TU), Kathmandu, Nepal, 1995; and 3) MA in Sociology, TU, 1997. I have more than 10 years of professional experience in socio-economic research, monitoring and documentation on agricultural and natural resource management. I had worked in Lumle Agricultural Research Centre, western Nepal from Nov. 1997 to Dec. 2000; CARE Nepal, mid-western Nepal from Mar. 2003 to June 2006 and WTLCP in far-western Nepal from June 2006 to Jan. 2011, Training Institute for Technical Instruction (TITI) from July to Sep 2011, UN Women Nepal from Sep to Dec 2011 and Mercy Corps Nepal from 24 Jan 2012 to 14 August 2016 and CAMRIS International in Nepal commencing 1 February 2017. I have published articles to my credit.

Sunday, July 15, 2018

Multicategory Discrete Probability Distribution of Sampling Without Replacement and Excel Software, Statistical Note 25

Among 40 participants in a training, 8 were youths (less than 30 years), 25 were adults (30 to 59 years) and remaining 7 were senior citizens (60 or more years). 2 participants are selected at random one after another without replacement of the name of the first selected participant. Calculate the probability distribution of youths in which the order does not matter whether the youths are sampled or not in the first or second draw.

Counting the total and favorable numbers of outcomes constituting the specified number of objects sampled without replacement from the finite population is important to calculate the probability of favorable events. Excel software is commonly available in the desktop or the laptop and is an important means to calculate the discrete probability distribution. I take an example from my Statistical Note 20, and will show how calculate the multicategory discrete probability distribution without replacement using Excel function. Refer to that note for conceptual clarity on using the tree diagram and formula to calculate the Multivariate Hypergeometric probability distribution.

EXCEL Software

EXCEL software does not have an inbuilt function to calculate Multivariate Hypergeometric probability. One inbuilt EXCEL function is used to calculate the Multivariate Hypergeometric probability – COMBIN function.

First, I will prepare the dataset for an outcome constituting one youth and one adult in two participants sampled without replacement as shown in Diagram 1. It is the case of group two outcomes discussed under Formula section in the Statistical Note 20.
















Diagram 1: Dataset showing the number of participants by age category and two participants sampled without replacement (one youth and one adult)

Referring to the Multivariate Hypergeometric distribution formula, the COMBIN function is used for each category of participants in the population and in the sample drawn without replacement as shown in Diagram 2. The COMBIN function has two fields - ‘Number’ and ‘Number_choosen’. In Diagram 2, ‘Number’ takes the value as the number of items by certain category in the population. In this example, eight are youths among 40 participants, and one of two participants sampled is youth as shown in cells B2 and B6 respectively in Diagram 2. The same formula is applied to all categories of participants and total participants.











Diagram 2: Application of COMBIN Function in Excel

The PRODUCT function is applied to calculate the product of a series of numbers. The function has several fields – ‘Number1’, ‘Number2’. Referring to the Multivariate Hypergeometric distribution formula, in this example, the ‘Number1’ field takes a series of values from cells B10 to B12 on COMBIN values as shown in Diagram 3. 











Diagram 3: Application of PRODUCT Function in Excel

The probability of an outcome, an occurrence of one youth and another adult in participants sampled without replacement among 40 participants is equal to P(1,1,0) discussed under Group Four Outcomes under Formula section of the Statistical Note 20. This value is calculated by dividing the value in cell B14 by the value in B13, which is equal to 0.2564 as shown in the function argument box in Diagram 3.

Similar process is followed to calculate the probability of each group of outcomes, mentioned in the Statistical Note 20. Finally, adding the probability of each group of outcomes will give the Multivariate Hypergeometric probability distribution.

Discussion in this note and my former Statistical Note 20 indicate that both manual and auto calculation produce the same values and are useful to calculate the multicategory discrete probability distribution without replacement, Multivariate Hypergeometric Distribution. Conceptual understanding is a backbone and automatization is efficient. Thus, both are important knowledge and skill sets.

No comments:

Post a Comment