Family Tree

Family Tree

About Me

My photo
Kathmandu, Bagmati Zone, Nepal
I am Basan Shrestha from Kathmandu, Nepal. I use the term 'BASAN' as 'Balancing Actions for Sustainable Agriculture and Natural Resources'. I am a Design, Monitoring & Evaluation professional. I hold 1) MSc in Regional and Rural Development Planning, Asian Institute of Technology, Thailand, 2002; 2) MSc in Statistics, Tribhuvan University (TU), Kathmandu, Nepal, 1995; and 3) MA in Sociology, TU, 1997. I have more than 10 years of professional experience in socio-economic research, monitoring and documentation on agricultural and natural resource management. I had worked in Lumle Agricultural Research Centre, western Nepal from Nov. 1997 to Dec. 2000; CARE Nepal, mid-western Nepal from Mar. 2003 to June 2006 and WTLCP in far-western Nepal from June 2006 to Jan. 2011, Training Institute for Technical Instruction (TITI) from July to Sep 2011, UN Women Nepal from Sep to Dec 2011 and Mercy Corps Nepal from 24 Jan 2012 to 14 August 2016 and CAMRIS International in Nepal commencing 1 February 2017. I have published articles to my credit.

Wednesday, November 27, 2019

Conditional and Joint Probabilities from an Exemplary Survey Dataset, Statistical Note 46

Understanding the concepts of joint and conditional probabilities and developing the skill to apply the concepts to calculate from the given dataset is important in the real time.

An exemplary survey dataset constitutes one hundred records of randomly sampled respondents categorized by smoking habit (smokers or non-smokers) and food habit (vegetarians or non-vegetarians). A part of the dataset in value label view of SPSS is shown in Table 1. Calculate the probability that a randomly sample respondent is a non-smokers is a non-vegetarian.

Table 1: Part of a dataset with smoking and food habits


























Concept

Calculating the probability of a "AND" compound event that a randomly selected respondent is a smoker who is a vegetarian also includes calculating the probabilities of other events. Several concepts are introduced while answering this question.

This example has two discrete random variables or categorical variables each with two mutually exclusive categories of response. One categorical variable is the smoking habit of a randomly sampled respondent which has two categories of response: smoker (S) or non-smoker (NS). Another categorical variable is the food habit which also has two mutually exclusive categories: vegetarian (V) and non-vegetarian (NV).

Simple or marginal probability: Let ‘S’ be a random event that a randomly sampled respondent is a smoker. The probability of randomly sampled smoker, represented by P(S) is the total number of smokers divided by total number of respondents. It is also referred to as the relative frequency. Similarly, P(NS), P(V) and P(NV) are calculated.

Conditional probability: Let V/S be a simple event that a participant is a vegetarian among the smokers. The conditional probability of vegetarians among smokers symbolized by P(V/S) is total number of vegetarians among smokers divided by total number of smokers. Here the occurrence of the event of vegetarian smoker is dependent on the event of occurrence of smokers.

Joint Probability: Let ‘S intersection V’,  ‘SV’ or ‘S and V’ is a "AND" compound event that a respondent is a smoker and a vegetarian. Here, the multiplication rule of two dependent events is applied. The joint probability of two dependent events is the product of a marginal probability and the conditional probability. In this case, the joint probability of a smoker who is a vegetarian indicated by P(S intersection V), P(SV) or P(S and V) in which both events of smoker and vegetarian among all smokers occur is the product of P(S) and P(V/S). Likewise, P(NV/S), P(V/NS), P(NV/NS), P(NSV), P(SNV) and P(NSNV) are calculated.

Calculation

A contingency or cross table from the given survey dataset can be generated either in SPSS or Excel package upon the availability of the software.  In SPSS, using the function ‘Crosstabs’ in Descriptive Statistics’ group of ‘Analyze’ tab, one can get the cross table as in Table 2. In Excel, ‘Pivot Table’ function in the ‘Tables’ group in ‘Insert’ tab can be used to generate cross table like this.

Table 2: Cross table of smoking habit and food habit
















Table 2 constituting four cells and totals to presents the frequencies, row percent (% within SMOKE), column percent (% within VEG) and percent of total respondents. Now, the concepts discussed above are applied to calculate probabilities.

Simple or Marginal Probability: Table summarizes that 25 out of 100 respondents are smokers so that P(S) is equal to 0.25, which is 25 percent in percentage term as shown by ‘% of Total’. P(NS) is 0.75 or 75 percent in percentage term. Likewise, P(V) is 0.25 or 25 percent in percentage term and P(NV) is 0.75 or 75 percent in percentage term.

Conditional Probability: P(V/S) is calculated looking at the first cell of the table. Eight out of 25 smokers are vegetarians so that P(V/S) is equal to eight divided by 25 equal to 0.32, which is equal to the row percent (% within SMOKE) of 32% in percentage terms. Similarly, other conditional probabilities are calculated as P(NV/S)=0.68, P(V/NS)=0.227, P(NV/NS)=0.773.

Joint probability: P(SV) is the product of P(S) and P(V/S), equal to the product of (25 by 100 ) and (eight by 25), equal to 0.08 or 8 percent in percentage term. This is equal to ‘% of Total’ in the first cell of Table 2.

Upon filling manually the probability values in yellow highlights of Table 2, the table looks as Table 3The joint probability for each cell is equal to percent of total value in percentage term.

Table 3: Cross table of smoking habit and food habit with probabilities manual added





















It is hoped that such a simple example will help create curiosity among the readers as to applying the concept in the real time data.

No comments:

Post a Comment