Basan Shrestha's Diary: Conditional and Joint Probabilities from an Exemplary Survey Dataset, Statistical Note 46

Understanding the concepts of joint and conditional probabilities and developing the skill to apply the concepts to calculate from the given dataset is important in the real time.

An exemplary survey dataset constitutes one hundred records of randomly sampled respondents categorized by smoking habit (smokers or non-smokers) and food habit (vegetarians or non-vegetarians). A part of the dataset in value label view of SPSS is shown in Table 1. Calculate the probability that a randomly sample respondent is a non-smokers is a non-vegetarian.

Table 1: Part of a dataset with smoking and food habits

Concept

Calculating the probability of a "AND" compound event that a randomly selected respondent is a smoker who is a vegetarian also includes calculating the probabilities of other events. Several concepts are introduced while answering this question.

This example has two discrete random variables or categorical variables each with two mutually exclusive categories of response. One categorical variable is the smoking habit of a randomly sampled respondent which has two categories of response: smoker (S) or non-smoker (NS). Another categorical variable is the food habit which also has two mutually exclusive categories: vegetarian (V) and non-vegetarian (NV).

Simple or marginal probability: Let ‘S’ be a random event that a randomly sampled respondent is a smoker. The probability of randomly sampled smoker, represented by P(S) is the total number of smokers divided by total number of respondents. It is also referred to as the relative frequency. Similarly, P(NS), P(V) and P(NV) are calculated.

Conditional probability: Let V/S be a simple event that a participant is a vegetarian among the smokers. The conditional probability of vegetarians among smokers symbolized by P(V/S) is total number of vegetarians among smokers divided by total number of smokers. Here the occurrence of the event of vegetarian smoker is dependent on the event of occurrence of smokers.

Joint Probability: Let ‘S intersection V’, ‘S∩V’ or ‘S and V’ is a "AND" compound event that a respondent is a smoker and a vegetarian. Here, the multiplication rule of two dependent events is applied. The joint probability of two dependent events is the product of a marginal probability and the conditional probability. In this case, the joint probability of a smoker who is a vegetarian indicated by P(S intersection V), P(S∩V) or P(S and V) in which both events of smoker and vegetarian among all smokers occur is the product of P(S) and P(V/S). Likewise, P(NV/S), P(V/NS), P(NV/NS), P(NS∩V), P(S∩NV) and P(NS∩NV) are calculated.

Calculation

A contingency or cross table from the given survey dataset can be generated either in SPSS or Excel package upon the availability of the software. In SPSS, using the function ‘Crosstabs’ in Descriptive Statistics’ group of ‘Analyze’ tab, one can get the cross table as in Table 2. In Excel, ‘Pivot Table’ function in the ‘Tables’ group in ‘Insert’ tab can be used to generate cross table like this.

Table 2: Cross table of smoking habit and food habit

Table 2 constituting four cells and totals to presents the frequencies, row percent (% within SMOKE), column percent (% within VEG) and percent of total respondents. Now, the concepts discussed above are applied to calculate probabilities.

Simple or Marginal Probability: Table summarizes that 25 out of 100 respondents are smokers so that P(S) is equal to 0.25, which is 25 percent in percentage term as shown by ‘% of Total’. P(NS) is 0.75 or 75 percent in percentage term. Likewise, P(V) is 0.25 or 25 percent in percentage term and P(NV) is 0.75 or 75 percent in percentage term.

Conditional Probability: P(V/S) is calculated looking at the first cell of the table. Eight out of 25 smokers are vegetarians so that P(V/S) is equal to eight divided by 25 equal to 0.32, which is equal to the row percent (% within SMOKE) of 32% in percentage terms. Similarly, other conditional probabilities are calculated as P(NV/S)=0.68, P(V/NS)=0.227, P(NV/NS)=0.773.

Joint probability: P(S∩V) is the product of P(S) and P(V/S), equal to the product of (25 by 100 ) and (eight by 25), equal to 0.08 or 8 percent in percentage term. This is equal to ‘% of Total’ in the first cell of Table 2.

Upon filling manually the probability values in yellow highlights of Table 2, the table looks as Table 3. The joint probability for each cell is equal to percent of total value in percentage term.

Table 3: Cross table of smoking habit and food habit with probabilities manual added