06-03-2016 11:04 AM
I am trying to create a new variable and I want to add a proportion to the variable. For example: new varibale: Asthma- assign to 0.2% of the observations in the dataset based upon smoking status and gender etc. I am using this code however, I am getting strange output because the proportion is non-smokers is higher with asthma. I was wondering if anyone knows how to do this? Or if my code needs some readjusting.
if Smoking= 'Smoker' and Sex='Male' then asthma= ifn(rand("uniform") <= 0.054, 1, 0);
if Smoking= 'Non-smoker' and Sex='Male' then asthma = ifn(rand("uniform") <= 0.0192, 1, 0);
Thank you for your help in advance!
06-03-2016 12:22 PM
It isn't clear if you are supposed to end up with 0.2% with asthma overall or in the smoking population.
Where did 0.054 and 0.0192 come from?
Do you have any females in the data?
Without data its a bit hard to consider.
What I would likely do, being of a slightly odd sort, is consider this a stratification and sample selection problem.
You would have 4 strata, Male/Female X Smoker/NonSmoker. You would provide a samprate or sampsize value for each strata.
06-03-2016 12:32 PM
Sorry the 0.2% was just an example. The prevalence of asthma is 1.92% for male and 1.25% for female non-smokers. As for smokers it is 5.4% for males and 3.5% for females.
if Smoker= 'Smoker' AND sex='Male' then asthma= ifn(rand("uniform") <= 0.054, 1, 0);
if Smoker= 'Non-smoker' AND sex='Male' then asthma = ifn(rand("uniform") <= 0.0192, 1, 0);
if Smoker= 'Smoker' AND sex='Female' then asthma= ifn(rand("uniform") <= 0.035, 1, 0);
if Smoker= 'Non-smoker' AND sex='Female' then asthma = ifn(rand("uniform") <= 0.0125, 1, 0);
06-03-2016 01:16 PM
So did you do a formal test that the resulting rate for the Asthma assignements were signficantly different from the expected rate?
My guess would be that your sample in one or more of the strata is relatively small.
With a quick test generating 100 samples of size 100 for Rand('Uniform') I get a range of 1 to 10 values less than or equal to 0.054 with 20% 3 or smaller and 25% larger than 6.
You likely need to have the sample within each gender/smoking status around 1000 to get rates close to what you expect.