Family Tree

Family Tree

About Me

My photo
Kathmandu, Bagmati Zone, Nepal
I am Basan Shrestha from Kathmandu, Nepal. I use the term 'BASAN' as 'Balancing Actions for Sustainable Agriculture and Natural Resources'. I am a Design, Monitoring & Evaluation professional. I hold 1) MSc in Regional and Rural Development Planning, Asian Institute of Technology, Thailand, 2002; 2) MSc in Statistics, Tribhuvan University (TU), Kathmandu, Nepal, 1995; and 3) MA in Sociology, TU, 1997. I have more than 10 years of professional experience in socio-economic research, monitoring and documentation on agricultural and natural resource management. I had worked in Lumle Agricultural Research Centre, western Nepal from Nov. 1997 to Dec. 2000; CARE Nepal, mid-western Nepal from Mar. 2003 to June 2006 and WTLCP in far-western Nepal from June 2006 to Jan. 2011, Training Institute for Technical Instruction (TITI) from July to Sep 2011, UN Women Nepal from Sep to Dec 2011 and Mercy Corps Nepal from 24 Jan 2012 to 14 August 2016 and CAMRIS International in Nepal commencing 1 February 2017. I have published articles to my credit.

Saturday, July 28, 2018

Converting Multi-category to Two-category Discrete Probability Distribution of Sampling Without Replacement, Statistical Note 27

Among 40 participants in a training, 8 were youths (less than 30 years), 25 were adults (30 to 59 years) and remaining 7 were senior citizens (60 or more years). 2 participants are selected at random one after another without replacement of the name of the first selected participant. Calculate the probability that one of two participants is a youth.

Multi-categories of a random variable can be reduced to two categories of an outcome in dependent experiments or trails without replacement. This is important because two-category discrete probability distribution is most commonly as one is concerned with the probability of a successful or failure event. Thus, Hypergeometric distribution function can be used instead of Multivariate Hypergeometric distribution function. I will discuss the process of reducing the multi-categories to two-categories and calculating the two-category discrete probability distribution without replacement using tree diagram, formula and Excel software function. For details on two-category and multi-category discrete probability distributions refer to my statistical notes from 17 to 25.

Tree Diagram

Tree diagram is an important means to visualize, count the number of outcomes and calculate the probability.  There are only two categories, success and failure in each selection of a participant without replacement. Let X be an event that a youth is selected in a random selection of a participant, also referred to as the success. The marginal probability of X, denoted by P(X) equal to p, is eight youths divided by total of 40 participants, which is equal to 0.20 in the first selection of a participant. Let Y be an event failure that will consist of other participants (adults and senior citizens). Thus, the marginal probability of failure, denote by P(Y) equal to q, is 32 other aged participants divided by total of 40 participants, equal to 0.80 in the first selection of a participant. The conditional probabilities P(X/X), P(Y/X), P(X/Y) and P(Y/Y) in the second selection of a participant without replacement remains different than the marginal probabilities in the first selection of a participant (Diagram 1).












Diagram 1: Marginal and conditional probabilities in two age grouped participants sampled without replacement

The joint probability that an event X appears in the first selection and an event Y appears on the second roll, denoted by P(X∩Y), is the product of P(X) and P(Y/X) which is equal to (8X32) divided by (40X39), 0.1641. Likewise, the joint probability that an event Y appears in the first roll and an event X appears on the second roll, denoted by P(Y∩X), is the product of P(Y) and P(X/Y) which is also equal to (8X32) divided by (40X39), 0.1641. Thus, the total probability of an outcome that X of two events occurs, is the sum of P(X∩Y) and P(Y∩X), which is equal to 0.3282.

Formula

Formula is another means of calculating the discrete probability distribution. Let X be a random variable of interest that takes one of 0, 1 or 2 values as the number of youths in the sample of two participants sampled without replacement, denoted by ‘x’. The probability distribution of X depends on the parameters, ‘n’, ‘M’ and ‘N’, and is given by the expression

P(X=x) = h(x;n,M,N) = [C(M,x) X C(N-M,n-x)]/C(N,n)

This distribution is referred to as Hypergeometric distribution.

In this example, n=2, M=8 and N=40 and ‘x’ takes the value 1. Putting these values in the above formula, one gets

P(X=1) = [C(8,1) X C(32,1)/C(40,2)] = (8 X 32 X 2)/ (40 X 39) = 0.3282.

EXCEL Function

Excel software is commonly available in the desktop or the laptop and is an important means to calculate the discrete probability distribution. Excel software has the ‘HYPGEOM.DIST’ function that has five fields ‘Sample-s’, ‘Number_sample’, ‘Population_s’, ‘Number_pop’ and Cumulative’ as shown in the function argument box of Diagram 2.





















Diagram 2: ‘HYPERGEOM.DIST’ Function Arguments Using Dataset in Excel Worksheet and using ‘FALSE’ logical value in the field ‘Cumulative’

The field ‘Sample_s’ takes the number of successes in trials. In this example, this field takes the value one youth as shown in the cell B3 in the table as well as an argument box in Diagram 2. The field ‘Number_sample’ is two participants sampled without replacement. The field ‘Population_s’ is the number of successes in the population. This example has 8 youths out of 40 participants. The field ‘Number_pop’ is the population size, the total of 40 participants. The field ‘Cumulative’ is a logical value that determines the form of the function. If ‘Cumulative’ is ‘FALSE’, ‘HYPGEOM.DIST’ calculates the probability mass function (PMF), which gives the probability associated with the value assigned to the field ‘Sample_s’ as the number of successes.

Fixing all five fields in the function arguments, ‘HYPGEOM.DIST’ function calculated the PMF equal to 0.3282. It means that there is 32.8 percent chance that one of two participants sampled without replacement will be a youth.

The probability calculated using Excel software function is equal to the values calculated in tree diagram and formula sections above. Discussion in this note indicates that the multi-category can be reduced to two-category in which one category will be considered as a successful event and another as a failure event. Then, Hypergeometric distribution can be applied to two category discrete probability distribution without replacement. This will limit the use of Multivariate Hypergeometric Probability Distribution function. Another learning is that both manual and auto calculation produce the same values and are useful to calculate the discrete probability distribution without replacement. Conceptual understanding is a backbone and automatization is efficient. Thus, both are important knowledge and skill sets.

No comments:

Post a Comment