BookmarkSubscribeRSS Feed
tinak
Calcite | Level 5

Hello,

 

I am trying to create a new variable and I want to add a proportion to the variable. For example: new varibale: Asthma- assign to 0.2% of the observations in the dataset based upon smoking status and gender etc. I am using this code however, I am getting strange output because the proportion is non-smokers is higher with asthma. I was wondering if anyone knows how to do this? Or if my code needs some readjusting.

 

data X;

set X;

if Smoking= 'Smoker' and Sex='Male' then asthma= ifn(rand("uniform") <= 0.054, 1, 0);

if Smoking= 'Non-smoker' and Sex='Male' then asthma = ifn(rand("uniform") <= 0.0192, 1, 0);

run;

 

Thank you for your help in advance!

 

3 REPLIES 3
ballardw
Super User

It isn't clear if you are supposed to end up with 0.2% with asthma overall or in the smoking population.

 

Where did 0.054 and 0.0192 come from?

Do you have any females in the data?

 

Without data its a bit hard to consider.

 

What I would likely do, being of a slightly odd sort, is consider this a stratification and sample selection problem.

You would have 4 strata, Male/Female X Smoker/NonSmoker. You would provide a samprate or sampsize value for each strata.

tinak
Calcite | Level 5

Hello,

 

Sorry the 0.2% was just an example. The prevalence of asthma is 1.92% for male and 1.25% for female non-smokers. As for smokers it is 5.4% for males and 3.5% for females.

 

data x;

set x;

if Smoker= 'Smoker' AND sex='Male' then asthma= ifn(rand("uniform") <= 0.054, 1, 0);

if Smoker= 'Non-smoker' AND sex='Male' then asthma = ifn(rand("uniform") <= 0.0192, 1, 0);

run;

data x;

set x;

if Smoker= 'Smoker' AND sex='Female' then asthma= ifn(rand("uniform") <= 0.035, 1, 0);

if Smoker= 'Non-smoker' AND sex='Female' then asthma = ifn(rand("uniform") <= 0.0125, 1, 0);

run;

 

ballardw
Super User

So did you do a formal test that the resulting rate for the Asthma assignements were signficantly different from the expected rate?

 

My guess would be that your sample in one or more of the strata is relatively small.

 

With a quick test generating 100 samples of size 100 for Rand('Uniform') I get a range of 1 to 10 values less than or equal to 0.054 with 20% 3 or smaller and 25% larger than 6.

 

You likely need to have the sample within each gender/smoking status around 1000 to get rates close to what you expect.

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 2050 views
  • 0 likes
  • 2 in conversation