Hello,
I am trying to create a new variable and I want to add a proportion to the variable. For example: new varibale: Asthma- assign to 0.2% of the observations in the dataset based upon smoking status and gender etc. I am using this code however, I am getting strange output because the proportion is non-smokers is higher with asthma. I was wondering if anyone knows how to do this? Or if my code needs some readjusting.
data X;
set X;
if Smoking= 'Smoker' and Sex='Male' then asthma= ifn(rand("uniform") <= 0.054, 1, 0);
if Smoking= 'Non-smoker' and Sex='Male' then asthma = ifn(rand("uniform") <= 0.0192, 1, 0);
run;
Thank you for your help in advance!
It isn't clear if you are supposed to end up with 0.2% with asthma overall or in the smoking population.
Where did 0.054 and 0.0192 come from?
Do you have any females in the data?
Without data its a bit hard to consider.
What I would likely do, being of a slightly odd sort, is consider this a stratification and sample selection problem.
You would have 4 strata, Male/Female X Smoker/NonSmoker. You would provide a samprate or sampsize value for each strata.
Hello,
Sorry the 0.2% was just an example. The prevalence of asthma is 1.92% for male and 1.25% for female non-smokers. As for smokers it is 5.4% for males and 3.5% for females.
data x;
set x;
if Smoker= 'Smoker' AND sex='Male' then asthma= ifn(rand("uniform") <= 0.054, 1, 0);
if Smoker= 'Non-smoker' AND sex='Male' then asthma = ifn(rand("uniform") <= 0.0192, 1, 0);
run;
data x;
set x;
if Smoker= 'Smoker' AND sex='Female' then asthma= ifn(rand("uniform") <= 0.035, 1, 0);
if Smoker= 'Non-smoker' AND sex='Female' then asthma = ifn(rand("uniform") <= 0.0125, 1, 0);
run;
So did you do a formal test that the resulting rate for the Asthma assignements were signficantly different from the expected rate?
My guess would be that your sample in one or more of the strata is relatively small.
With a quick test generating 100 samples of size 100 for Rand('Uniform') I get a range of 1 to 10 values less than or equal to 0.054 with 20% 3 or smaller and 25% larger than 6.
You likely need to have the sample within each gender/smoking status around 1000 to get rates close to what you expect.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 16. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.