Understanding the concepts of joint and conditional probabilities and developing
the skill to apply the concepts to calculate from the given dataset is
important in the real time.
An exemplary survey dataset constitutes one hundred records of randomly
sampled respondents categorized by smoking habit (smokers or non-smokers) and food
habit (vegetarians or non-vegetarians). A part of the dataset in value label
view of SPSS is shown in Table 1. Calculate the probability that a randomly
sample respondent is a non-smokers is a non-vegetarian.
Concept
Calculating the probability of a "AND" compound event that a
randomly selected respondent is a smoker who is a vegetarian also includes
calculating the probabilities of other events. Several concepts are introduced
while answering this question.
This example has two discrete random variables or categorical variables
each with two mutually exclusive categories of response. One categorical
variable is the smoking habit of a randomly sampled respondent which has two
categories of response: smoker (S) or non-smoker (NS). Another categorical
variable is the food habit which also has two mutually exclusive categories:
vegetarian (V) and non-vegetarian (NV).
Simple or marginal probability: Let ‘S’ be a random event that
a randomly sampled respondent is a smoker. The probability of randomly sampled smoker,
represented by P(S) is the total number of smokers divided by total number of
respondents. It is also referred to as the relative frequency. Similarly, P(NS),
P(V) and P(NV) are calculated.
Conditional probability: Let V/S be a simple event that a
participant is a vegetarian among the smokers. The conditional probability of
vegetarians among smokers symbolized by P(V/S) is total number of vegetarians
among smokers divided by total number of smokers. Here the occurrence of the
event of vegetarian smoker is dependent on the event of occurrence of smokers.
Joint Probability: Let ‘S intersection V’, ‘S∩V’ or ‘S
and V’ is a "AND" compound event that a respondent is a smoker and a
vegetarian. Here, the multiplication rule of two dependent events is applied.
The joint probability of two dependent events is the product of a marginal
probability and the conditional probability. In this case, the joint
probability of a smoker who is a vegetarian indicated by P(S intersection V),
P(S∩V) or P(S and V) in which both events
of smoker and vegetarian among all smokers occur is the product of P(S) and
P(V/S). Likewise, P(NV/S), P(V/NS), P(NV/NS), P(NS∩V),
P(S∩NV) and P(NS∩NV)
are calculated.
Calculation
A contingency or cross
table from the given survey dataset can be generated either in SPSS or Excel
package upon the availability of the software.
In SPSS, using the function ‘Crosstabs’ in Descriptive Statistics’ group
of ‘Analyze’ tab, one can get the cross table as in Table 2. In Excel, ‘Pivot
Table’ function in the ‘Tables’ group in ‘Insert’ tab can be used to generate
cross table like this.
Table 2 constituting four cells and totals to presents the frequencies, row percent (% within SMOKE), column percent (% within VEG) and percent of total respondents. Now, the concepts discussed above are applied to calculate probabilities.
Simple or Marginal Probability:
Table summarizes that 25 out of 100 respondents are smokers so that P(S) is
equal to 0.25, which is 25 percent in percentage term as shown by ‘% of Total’.
P(NS) is 0.75 or 75 percent in percentage term. Likewise, P(V) is 0.25 or 25
percent in percentage term and P(NV) is 0.75 or 75 percent in percentage term.
Conditional Probability: P(V/S)
is calculated looking at the first cell of the table. Eight out of 25 smokers
are vegetarians so that P(V/S) is equal to eight divided by 25 equal to 0.32,
which is equal to the row percent (% within SMOKE) of 32% in percentage terms.
Similarly, other conditional probabilities are calculated as P(NV/S)=0.68, P(V/NS)=0.227,
P(NV/NS)=0.773.
Joint probability: P(S∩V) is the product of P(S) and P(V/S), equal to the
product of (25 by 100 ) and (eight by 25), equal to 0.08 or 8 percent in
percentage term. This is equal to ‘% of Total’ in the first cell of Table 2.
Upon filling manually the
probability values in yellow highlights of Table 2, the table looks as Table 3. The joint probability for each cell is equal to percent of total value in percentage term.
It is hoped that such a simple example will help create curiosity among the readers as to applying the concept in the real time data.
No comments:
Post a Comment