Hello,
I have a large dataset (~30,000 obs).
I have a variable called prescriptions, and there are many many different options. In order to adjust for certain drugs in my model I need to convert these to numerical. I also need to get a count of how many of each drug there is to see if it is worth adjusting for anyway.
Is there a quick/easy way?
I would hate to go one by one:
if drug="atorvastatin" then numdrug=1;
if ...etc for hundreds of drugs.
Thank you
Depends on your data and what you're trying to do. Why do you think you need to convert them to numeric values to start off?
If the drugs are consistently named you can sort and arbitrarily number them quite easily. If they are not consistently named then you have a more complicated problem.
proc sort data=have;
by drugName;
run;
data want;
set have;
by drugName;
retain drugNum 0;
if first.drugName then drugNum+1;
run;
FYI - 30k records is not a large dataset. The number of observations doesn't change the approach in this situation, but the number of drugs will.
@Student77 wrote:
Hello,
I have a large dataset (~30,000 obs).
I have a variable called prescriptions, and there are many many different options. In order to adjust for certain drugs in my model I need to convert these to numerical. I also need to get a count of how many of each drug there is to see if it is worth adjusting for anyway.
Is there a quick/easy way?
I would hate to go one by one:
if drug="atorvastatin" then numdrug=1;
if ...etc for hundreds of drugs.
Thank you
what will this do:
retain drugNum 0;
if first.drugName then drugNum+1;
What I am trying to do is a cox proportional hazards model, and I may need to adjust for contraceptive use, hormone therapy, and menopause treatment--but in order to know if this is necessary I first need a count of how many of each drug there is, and then I need to group these.
ie group together all the types of contraceptives, and then count #contraceptives
same with hormone therapy, and the other variables I'll create.
I do not know exactly how many different values there are, but they are consistently named
@Student77 wrote:
what will this do:
retain drugNum 0; if first.drugName then drugNum+1;
It creates a enumerator for your drugs which is what you've asked for. How this works is explained here and here.
30,000 observations is a (very) small data set.
Without actual data examples, it's hard to give better code than @Reeza 's.
How do we know a drug is a contraceptive, and what should the output look like?
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.