Family Tree

Family Tree

About Me

My photo
Kathmandu, Bagmati Zone, Nepal
I am Basan Shrestha from Kathmandu, Nepal. I use the term 'BASAN' as 'Balancing Actions for Sustainable Agriculture and Natural Resources'. I am a Design, Monitoring & Evaluation professional. I hold 1) MSc in Regional and Rural Development Planning, Asian Institute of Technology, Thailand, 2002; 2) MSc in Statistics, Tribhuvan University (TU), Kathmandu, Nepal, 1995; and 3) MA in Sociology, TU, 1997. I have more than 10 years of professional experience in socio-economic research, monitoring and documentation on agricultural and natural resource management. I had worked in Lumle Agricultural Research Centre, western Nepal from Nov. 1997 to Dec. 2000; CARE Nepal, mid-western Nepal from Mar. 2003 to June 2006 and WTLCP in far-western Nepal from June 2006 to Jan. 2011, Training Institute for Technical Instruction (TITI) from July to Sep 2011, UN Women Nepal from Sep to Dec 2011 and Mercy Corps Nepal from 24 Jan 2012 to 14 August 2016 and CAMRIS International in Nepal commencing 1 February 2017. I have published articles to my credit.

Wednesday, November 27, 2019

Conditional and Joint Probabilities from an Exemplary Survey Dataset, Statistical Note 46

Understanding the concepts of joint and conditional probabilities and developing the skill to apply the concepts to calculate from the given dataset is important in the real time.

An exemplary survey dataset constitutes one hundred records of randomly sampled respondents categorized by smoking habit (smokers or non-smokers) and food habit (vegetarians or non-vegetarians). A part of the dataset in value label view of SPSS is shown in Table 1. Calculate the probability that a randomly sample respondent is a non-smokers is a non-vegetarian.

Table 1: Part of a dataset with smoking and food habits


























Concept

Calculating the probability of a "AND" compound event that a randomly selected respondent is a smoker who is a vegetarian also includes calculating the probabilities of other events. Several concepts are introduced while answering this question.

This example has two discrete random variables or categorical variables each with two mutually exclusive categories of response. One categorical variable is the smoking habit of a randomly sampled respondent which has two categories of response: smoker (S) or non-smoker (NS). Another categorical variable is the food habit which also has two mutually exclusive categories: vegetarian (V) and non-vegetarian (NV).

Simple or marginal probability: Let ‘S’ be a random event that a randomly sampled respondent is a smoker. The probability of randomly sampled smoker, represented by P(S) is the total number of smokers divided by total number of respondents. It is also referred to as the relative frequency. Similarly, P(NS), P(V) and P(NV) are calculated.

Conditional probability: Let V/S be a simple event that a participant is a vegetarian among the smokers. The conditional probability of vegetarians among smokers symbolized by P(V/S) is total number of vegetarians among smokers divided by total number of smokers. Here the occurrence of the event of vegetarian smoker is dependent on the event of occurrence of smokers.

Joint Probability: Let ‘S intersection V’,  ‘SV’ or ‘S and V’ is a "AND" compound event that a respondent is a smoker and a vegetarian. Here, the multiplication rule of two dependent events is applied. The joint probability of two dependent events is the product of a marginal probability and the conditional probability. In this case, the joint probability of a smoker who is a vegetarian indicated by P(S intersection V), P(SV) or P(S and V) in which both events of smoker and vegetarian among all smokers occur is the product of P(S) and P(V/S). Likewise, P(NV/S), P(V/NS), P(NV/NS), P(NSV), P(SNV) and P(NSNV) are calculated.

Calculation

A contingency or cross table from the given survey dataset can be generated either in SPSS or Excel package upon the availability of the software.  In SPSS, using the function ‘Crosstabs’ in Descriptive Statistics’ group of ‘Analyze’ tab, one can get the cross table as in Table 2. In Excel, ‘Pivot Table’ function in the ‘Tables’ group in ‘Insert’ tab can be used to generate cross table like this.

Table 2: Cross table of smoking habit and food habit
















Table 2 constituting four cells and totals to presents the frequencies, row percent (% within SMOKE), column percent (% within VEG) and percent of total respondents. Now, the concepts discussed above are applied to calculate probabilities.

Simple or Marginal Probability: Table summarizes that 25 out of 100 respondents are smokers so that P(S) is equal to 0.25, which is 25 percent in percentage term as shown by ‘% of Total’. P(NS) is 0.75 or 75 percent in percentage term. Likewise, P(V) is 0.25 or 25 percent in percentage term and P(NV) is 0.75 or 75 percent in percentage term.

Conditional Probability: P(V/S) is calculated looking at the first cell of the table. Eight out of 25 smokers are vegetarians so that P(V/S) is equal to eight divided by 25 equal to 0.32, which is equal to the row percent (% within SMOKE) of 32% in percentage terms. Similarly, other conditional probabilities are calculated as P(NV/S)=0.68, P(V/NS)=0.227, P(NV/NS)=0.773.

Joint probability: P(SV) is the product of P(S) and P(V/S), equal to the product of (25 by 100 ) and (eight by 25), equal to 0.08 or 8 percent in percentage term. This is equal to ‘% of Total’ in the first cell of Table 2.

Upon filling manually the probability values in yellow highlights of Table 2, the table looks as Table 3The joint probability for each cell is equal to percent of total value in percentage term.

Table 3: Cross table of smoking habit and food habit with probabilities manual added





















It is hoped that such a simple example will help create curiosity among the readers as to applying the concept in the real time data.

Monday, November 25, 2019

Simple Probability Calculation from an Exemplary Survey Dataset, Statistical Note 45

Understanding the concept of simple or marginal probability and developing the skill to apply the concept to calculate from the given dataset is important in the real time.

An exemplary survey dataset constitutes one hundred records of randomly sampled respondents categorized as smokers or non-smokers. A part of the dataset in value label view of SPSS is shown in Table 1. Calculate the probability that a randomly sample respondent is a non-smokers.


Table 1: Part of dataset of smokers and non-smokers



Concept 


Simple or marginal probability of an event is the total number of favorable cases divided by total number of cases. Let ‘S’ be a simple event that a participant is a smoker and the simple or the marginal probability of ‘S’ represented by P(S) is the total number of smokers divided by total number of respondents. It is also referred to as the relative frequency.

Calculation 

The survey dataset can be summarized either in SPSS or Excel package upon the availability of the software.  In SPSS, using the function ‘Frequencies’ in Descriptive Statistics’ group of ‘Analyze’ tab, one can get the frequency table as in Table 2.
 
Table 2: Frequency table of smokers and non-smokers

In Excel, ‘Descriptive Statistics’ function in the ‘Data Analysis’ Add-In program can be used to generate frequency table.

The percent or valid percent column shows that 20 percent of respondents are non-smokers. It means that 20 out of 100 respondents are non-smokers.   Thus, the simple or marginal probability that a randomly sampled respondent is a non-smokers is calculated as 20 divided by 100 equal to 0.20. Similarly, the simple or marginal probability of smokers can be calculated to be 0.80.
 I hope this simple example will help create curiosity among the readers as to applying the concept in the real time data.


Thursday, November 21, 2019

Same Sample Proportion with Different Sample Sizes for Chi-Squared Test for Goodness of Fit, Statistical Note 44

The level of confidence for statistical significance varies with the variation in the sample size of the same sample proportion.

For example, an expert is interested in knowing the proportion of smokers from the randomly selected sampled respondents. An expert assumes that half of adult population are smokers. An expert administers a question to the adults – Are you a smoker? The respondents respond to one of two categories of response – Yes or No.

An expert tries with a sample size of 100 individuals and finds that 55 respondents are non-smokers and remaining 45 are smokers. He uses Chi-Squared test for goodness of fit to test whether the sample proportions of non-smokers and smokers represent the population proportions, using the formula for one degree of freedom as below:
Chi-square = Sum(Oi-Ei)2/Ei

where:
Oi = Sampled/ observed proportion for ith category
Ei = population/ expected proportion for ith category

Using above formula, an expert calculates Chi-squared value for 100 samples as:
Chi-square =Sum(Oi-Ei)2/Ei = (55-50)2/50+(45-50)2/50 = 1

An expert is curious and calculates chi-squared values with the same sample proportion of non-smokers but with increasing sample size as below:

Table 1: Sample size with Same Sample Proportion of Non-Smokers, Chi Squared Value and Level of Significance







An expert finds that upto 300 samples, an expert is less than 95 confident that the sample truly represents the population and there remains high sampling error. As the sample size increase from 400 to more, an expert is more than 95 percent confident and sampling error remains lower. Thus, at least 400 sample size is required for the sample proportion of non-smokers equal to 0.55 to significantly outnumber the sample proportion of smokers (0.45). In other words, 400 respondents need to be sampled for 55 percent non-smokers to significantly outnumber 45 percent smokers.

Tuesday, November 19, 2019

One Sample Proportion for Statistical Significance and Sample Size, Statistical Note 43

As the sample size increases, even slightly bigger proportion of category of interest could significantly outnumber another category of binary response.

For example, an expert is interested in knowing the proportion of smokers from the randomly selected sampled respondents. An expert assumes that half of adult population are smokers. An expert administers a question to the adults – Are you a smoker? The respondent responds to the two categories response – Yes or No.


An expert estimates that how many non-smokers would statistically outnumber the smokers to draw valid conclusion. An expert uses the following formula to calculate the minimum number of non-smokers from the given sample size to outnumber the smokers:

 z=(p’-p)/√(pq/n)
  where ,
z=Test statistic, standard normal variate, with a value of 1.96 at 95% level of confidence
p’=Sample proportion of non-smokers
p=Population proportion of non-smokers, equal to 0.50
q= Population proportion of smokers, equal to 0.50
n=Sample size

An expert tries with a sample size of 10 individuals and calculates the minimum sample proportion or number of non-smokers required to statistically significant outnumber the smokers. Gradually he increases the sample size and calculates the minimum sample proportion and number of non-smokers required to statistically outnumber the smokers as shown in the table below:

Table 1: Number of Non-Smokers Required to Statistically Significant Outnumber the Smokers

An expert assumes whether six out of 10 non-smokers or the sample non-smoker proportion of 0.60 is enough for statistically significance to outnumber smokers. An expert then used the above formula and finds that the sample non-smoker proportion of 0.8099 or eight non-smokers out of 10 respondents are required for statistically significant outnumber the smokers. Gradually, an expert tries with one hundred thousand hypothetical sample size of respondents with the assumption that 50,001 non-smokers would outnumber 49,999 smokers. Unlike, using the formula an expert finds that the sample non-smoker proportion of 0.5030 or 50,300 non-smokers are required for statistically significant outnumber the smokers. An expert finally understands that as the sample size increases, the smaller sample proportion of non-smokers than the assumed sample proportions presented in the realtime column in the table could significantly outnumber the smokers.

Sunday, November 17, 2019

Pets: Part of one’s family

Handling bulls and old cows has been a nuisance for their owners as they are unproductive or less productive and costs more to rear. In such a case, the government should have a system of tagging the cows and bulls so that the owners can be identified if left open in the streets.
The unproductive or less productive cows and bulls can be reared in a group by some people or organisation so that their feeds are not wasted and their by-products can be used. For example, cow urine is considered holy and also medicinal value, so it can be sold to the users. Cow dung can be used to produce biogas and slurry can be sold to the farmers promoting organic farming or using
biodegradable manures. In other way, the unproductive or less productive cows and bulls can be sold
or given to the organisations such as Jatayu Restaurant, a vulture conservation restaurant. Similarly bulls can be castrated and given to rural households for using as drought power in ploughing land. If the government could ensure the quality of cow milk sold in the market, the government
should charge at least one to some rupees per litre of cow milk as tax to collect fund that goes for the
welfare of bulls and old cows.

Basan Shrestha,
Ghattekulo, Kathmandu

http://epaper.thehimalayantimes.com/html5/reader/production/default.aspx?pubname=&pubid=cd7278e2-4150-475f-8abe-305e5ed57783