Family Tree

Family Tree

About Me

My photo
Kathmandu, Bagmati Zone, Nepal
I am Basan Shrestha from Kathmandu, Nepal. I use the term 'BASAN' as 'Balancing Actions for Sustainable Agriculture and Natural Resources'. I am a Design, Monitoring & Evaluation professional. I hold 1) MSc in Regional and Rural Development Planning, Asian Institute of Technology, Thailand, 2002; 2) MSc in Statistics, Tribhuvan University (TU), Kathmandu, Nepal, 1995; and 3) MA in Sociology, TU, 1997. I have more than 10 years of professional experience in socio-economic research, monitoring and documentation on agricultural and natural resource management. I had worked in Lumle Agricultural Research Centre, western Nepal from Nov. 1997 to Dec. 2000; CARE Nepal, mid-western Nepal from Mar. 2003 to June 2006 and WTLCP in far-western Nepal from June 2006 to Jan. 2011, Training Institute for Technical Instruction (TITI) from July to Sep 2011, UN Women Nepal from Sep to Dec 2011 and Mercy Corps Nepal from 24 Jan 2012 to 14 August 2016 and CAMRIS International in Nepal commencing 1 February 2017. I have published articles to my credit.

Sunday, August 12, 2018

Theoretical and Observed Two-category Discrete Probability Distributions Without Replacement, Statistical Note 32

Draw 20 cards without replacement from  a deck of 52 cards and count the number of black cards. Repeat the same process for seven times or sets each constituting 20 cards. Calculate the theoretical and observed discrete probability distributions of number of black cards in 20 cards.

Theoretical probability distribution gives an idea of an ideal probability distribution, what a distribution should be given the parameters. The observed probability distribution is based on the real-time data and shows how different the distribution is from the ideal situation. Sampling distribution helps compare the theoretical and observed distributions.

Drawing some cards without replacement from a deck of 52 cards is an example of the two-category discrete probability distribution of sampling without replacement. Refer to my earlier Statistical Notes for clarity on calculating the two-category discrete probability using tree diagram, formula and Excel software function.
  
Theoretical Discrete Probability Distribution

I discussed on the Theoretical Two-Category Discrete Probability Distribution of sampling with replacement in my former Statistical Note 31. Here, I present only the table constituting the number of black cards in 20 cards drawn without replacement from a deck of cards and respective probabilities (Table 1). 

Table 1: Number of Black Cards in 20 Cards Drawn Without Replacement from a Deck of Cards and Respective Probabilities

















Occurrence of 10 black cards in 20 cards has the highest probability (highlighted yellow) and is thus, highly likely to occur. The likelihood decreases towards both sides of 10 black cards. Two extreme number of black cards, 0 and 20, have the least chance of occurrence.

Trial Data

I drew 20 cards without replacement from a deck of 52 cards in a set and the same process was repeated for seven sets or times. Table 2 presents the outcome of 20 cards drawn without replacement in each of seven sets. Black and red cards were coded one and zero respectively for symbolic representation.

Table 2: Outcomes in 20 cards drawn without replacement from a deck of cards in each of seven sets (Black card=1 and Red card=0)


















To summarize, the number of black cards in seven sets ranged from six to seven, nine and then 11 to 13 (Table 2). This is due to the sampling error. The observed mean number of black cards is the sum of the number of black cards from each of seven sets divided by seven, equal to 10.

Observed Discrete Probability Distribution

In further summary, it is noted that 12 black cards occurred twice in two of seven sets of 20 cards (Table 3). Thus, occurrence of 12 black cards is most likely to occur with the probability P(X=12)=0.285714.  Other five samples had non-repetitive number of black cards that occurred in 20 cards in each sample.

Table 3: Number of black cards in 20 cards drawn without Replacement in each of seven samples and probability

Difference between Theoretical and Observed Discrete Probability Distributions

Chart 1 compares the theoretical and observed two category discrete probability distribution of black cards in 20 cards drawn without replacement from a deck of cards.  This clearly shows the bell-shaped curve, the symmetric line chart of theoretical probability distribution and how different the observed distribution and charts are.













Conclusion

The theoretical two-category probability distribution differs from the observed distribution. The observed data could differ from one set to another because of non-uniformity in the condition in which a card is drawn without replacement from a deck of card.

Theoretical Two-Category Discrete Probability Distribution Without Replacement, Statistical Note 31

Calculate the theoretical discrete probability distribution of black cards in 20 cards drawn without replacement from a deck of 52 cards.

Theoretical probability distribution gives an idea of an ideal probability distribution, what a distribution should be given the parameters. Distribution of black cards in 20 cards drawn without replacement from a deck of cards is an example of the two-category discrete probability distribution of sampling without replacement. Refer to my earlier Statistical Notes for clarity on calculating the two-category discrete probability of samples drawn without replacement using tree diagram, formula and Excel software function.

Why 20 cards were chosen in this example?

One may pose why 20 cards were chosen, why not other numbers. Drawing 20 cards without replacement from a deck of 52 cards is a case of Hypergeometric Distribution, which is symmetric when the sample size is even number.

Theoretical Discrete Probability Distribution Without Replacement

This example has three characteristic features. First, the example has a finite population of 52 cards, denoted by ‘N’. Second, each card can be characterized as a success or a failure. Since the question asks the probability of black cards, the selection of a black card is considered as a success, and there are 26 black cards in the population. Third, a sample of 20 cards, denoted by ‘n’, is drawn without replacement in a way that each sample of 20 cards is equally likely to be selected.

Let X be a random variable of interest that takes one of 0 to  20 values as the number of black cards in the sample of 20 cards drawn without replacement, denoted by ‘x’. The probability distribution of X depends on the parameters, ‘n’, ‘M’ and ‘N’, and is given by the expression
P(X=x) = h(x;n,M,N) = Number of outcomes having X=x divided by total number of outcomes
P(X=x) = h(x;n,M,N) = [C(M,x) X C(N-M,n-x)]/C(N,n)
This distribution is referred to as Hypergeometric distribution.

In this example, n=20, M=26, N=52 and ‘x’ takes the value 0 to 2. Putting these values in the above formula, one gets
P(X=10) = [C(26,10) X C(26,10)/C(52,20)] = (26! X 26! X 32! X 20!) / (16! X 10! X 16! X 10! X 52!) = 0.223934379

This value is equal to the one presented in table 1. In the same way, the probability for other number of black cards in 20 cards drawn without replacement can be calculated.

Table 1: Number of success, failure and Probability of successes in 20 cards drawn without replacement from a deck of 52 cards



















Graphical Presentation

Chart 1 shows the theoretical two category discrete probability distribution of black cards in 20 cards drawn without replacement from a deck of cards.  This clearly shows the bell-shaped curve, the symmetric line chart of theoretical probability distribution.













Conclusion

Looking at the table and the chart one can see that the occurrence of 10 black cards in 20 cards drawn without replacement is highly likely. Two extreme number of heads, 0 and 20, have the least chance of occurrence.

Saturday, August 11, 2018

Theoretical and Observed Two-category Discrete Probability Distributions With Replacement, Statistical Note 30

Toss a coin 20 times and count the number of heads. Repeat the same process for seven times or sets each constituting 20 tosses. Calculate the theoretical and observed discrete probability distributions of number of heads.

Theoretical probability distribution gives an idea of an ideal probability distribution, what a distribution should be given the parameters. The observed probability distribution is based on the real-time data and shows how different the distribution is from the ideal situation. Sampling distribution helps compare the theoretical and observed distributions.

Tossing of a coin is an example of the two-category discrete probability distribution of sampling with replacement. Refer to my earlier Statistical Notes for clarity on calculating the two-category discrete probability using tree diagram, formula and Excel software function.

Theoretical Discrete Probability Distribution

I discussed on the Theoretical Two-Category Discrete Probability Distribution calculation in my former Statistical Note 29. Here, I present only the table constituting the number of heads in 20 tosses and respective probabilities (Table 1).

Table 1: Type and Probability of Outcomes in 20 tosses of a coin


















Turning up of 10 head in 20 tosses has highest probability and is thus, highly likely to occur. The likelihood decreases towards both sides of the mean value. Two extreme number of heads, 0 and 20, have the least chance of occurrence.

The theoretical or the population mean is the product of number of tosses ‘n’ and the probability of head turns up ‘p’. This is denoted by ‘np’, equal to 20 multiplied by half, equal to 10. It means that the mean value has the highest chance of occurrence.

The population variance is denoted by ‘npq’. Using the values from this example, the population variance is calculated to be equal to five. The population standard deviation is 2.2306.

Trial Data

I tossed a coin 20 times in a sample and the same process was repeated for seven sets or times. Table 2 presents the outcome of 20 tosses of a coin in each of seven sets. Head and tail were coded one and zero respectively for symbolic representation.

Table 2: Outcomes in 20 tosses of a coin in each of seven sets (head=1 and tail=0)

















To summarize, the number of heads in seven sets ranged from seven, and then nine to 13 (Table 2). This is due to the sampling error. The observed mean number of heads is the sum of the number of heads from each of seven sets divided by seven. This value is equal to 10.43 which is more than the population or theoretical mean equal to 10. It shows the difference between the theoretical and observed means.

Observed Discrete Probability Distribution

In further summary, it is noted that 11 heads turned up two times in two of seven sets of 20 tosses (Table 3). Thus, turning up of 11 heads is most likely to occur, two out of seven times with probability P(X=11)=0.285714.  Other five samples had non-repetitive number of heads that turned up in 20 tosses in each sample.

Table 3: Number of heads out of 20 tosses of a coin in each of seven samples and probability







Difference between Theoretical and Observed Discrete Probability Distributions

Chart 1 compares the theoretical and observed two category discrete probability distribution of heads in 20 tosses of a coin.  This clearly shows the bell-shaped curve, the symmetric line chart of theoretical probability distribution and how different the observed distribution and charts are.














Conclusion

The theoretical two-category probability distribution differs from the observed distribution. The observed data could differ from one set to another because of non-uniformity in the condition in which a coin is tossed repeatedly.

Friday, August 10, 2018

Theoretical Two-Category Discrete Probability Distribution With Replacement, Statistical Note 29

Calculate the theoretical discrete probability distribution of number of heads in 20 tosses of a coin.

Theoretical probability distribution gives an idea of an ideal probability distribution, what a distribution should be given the parameters. Tossing of a coin is an example of the two-category discrete probability distribution of sampling with replacement. Refer to my earlier Statistical Notes for clarity on calculating the two-category discrete probability using tree diagram, formula and Excel software function.

Why 20 tosses were chosen in this example?

One may pose why 20 tosses were chosen, why not other numbers. Tossing of an unbiased coin for ‘n’ independent trials is a case of Binomial Distribution, which is symmetric when the sample size is even number. When ‘n’ is large, Binomial distribution tends to Normal Distribution with the mean number of successes equal to ‘np’. A guide is that the mean should be at least five. For an unbiased coin, the number of independent trails should be at least 10 to get the mean value of five. 20 tosses are more than minimum number of trials required for the approximation to the Normal Distribution. Second, why I did not choose other numbers than 20 tosses? If I choose 20 tosses, the mean number of success will be 10, a number between not even a single head and all 20 heads turn up, which symmetrically or equally divides the numbers of head to the lesser and greater sides.

Theoretical Discrete Probability Distribution

The probability of success that a head lands in a toss of a coin denoted by ‘p’ is equal to half. Unlike, the probability of failure, the tail turns up, denoted by ‘q’ is also half.

Let X be a random variable of interest, successful event that the head lands up in a toss of a coin. X takes one of the values from not even a single head to all 20 heads in 20 tosses, denoted by ‘x’. Because, among 20 tosses of a coin it is likely that the number of heads that turns up could vary between not even a single head to all 20 heads. If the tail turns up in every toss of a coin, then the total number of heads is zero. That will happen if the coin is fully biased to the tail. Unlike, if the head turns up in every toss, the total number of heads is 20, which is again fully biased to the turning up of the head. These two are most extreme cases. In other instances, the number of heads that turn up will vary between these two extreme numbers.

Binomial distribution function is used to calculate the theoretical two category discrete probability distribution of this examples. The probability distribution of X depends on the index ‘n’ and parameter ‘p’, and is given by the expression: P(X=x) = C(n,x)pxqn-x. One can calculate the theoretical probability distribution in different ways.

Way one: One way is to count the number of possible outcomes for a category of outcomes and the total number of outcomes.  The total number of outcomes is calculated using the formula ‘number of possible outcomes of a trial power the number of trials’. In this example, the total number of outcomes is calculated to be, ‘2n’, which is 220, equal to 1,048,576.

The number of outcomes for a certain outcome type can be calculated using the Binomial coefficient value. If I am interested to know the number of outcomes constituting 10 head and 10 tails in 20 tosses, I can calculate it using the Binomial coefficient C(20,10), equal to 184756.

The probability of 10 heads in 20 tosses is calculated by number of outcomes for 10 heads in 20 tosses divided by total number of outcomes, 184756 divided by 1,048,576, equal to 0.176197. Likewise, the number of possible outcomes for each outcome category is calculated as presented and highlighted yellow in Table 1.

Way Two: Using the Binomial formula, example, for 10 heads in 20 tosses is calculated as C(20,10)(0.5)10(0.5)10, equal to 0.176197, highlighted yellow in Table 1. This formula can be applied to all possible number of heads in 20 tosses to manually calculate the probability distribution.

Third Way: Excel software function can be used for this example. The total probability of not even a single head to all 20 heads turned up in 20 tosses of a coin is one. Turning up of 10 heads out of 20 tosses has the highest probability, denoted by P(X=10) equal to 0.176197 highlighted yellow in Table 1.

Table 1: Type, Number and Probability of Outcomes 20 tosses of a coin




















Graphical Presentation
Chart 1 shows the theoretical two category discrete probability distribution of number of heads in 20 tosses of a coin.  This clearly shows the bell-shaped curve, the symmetric line chart of theoretical probability distribution.














Conclusion
Looking at the table and the chart one can see that turning up of 10 head in 20 tosses is highly likely. This is because as I discussed above the mean value is 10. The likelihood decreases towards both sides of the mean value. Two extreme number of heads, 0 and 20, have the least chance of occurrence. Since the number of toss is sufficiently more than the minimum number of 5 tosses, the Binomial distribution is approximate to the Normal distribution.