BookmarkSubscribeRSS Feed
Student77
Obsidian | Level 7

Hello,

 

I have a large dataset (~30,000 obs).

 

I have a variable called prescriptions, and there are many many different options. In order to adjust for certain drugs in my model I need to convert these to numerical. I also need to get a count of how many of each drug there is to see if it is worth adjusting for anyway.

 

Is there a quick/easy way?

 

I would hate to go one by one:

if drug="atorvastatin" then numdrug=1;

if ...etc for hundreds of drugs.

 

Thank you

5 REPLIES 5
Reeza
Super User

Depends on your data and what you're trying to do. Why do you think you need to convert them to numeric values to start off?

 

If the drugs are consistently named you can sort and arbitrarily number them quite easily. If they are not consistently named then you have a more complicated problem. 

 

proc sort data=have;
by drugName;
run;

data want;
set have;

by drugName;
retain drugNum 0;

if first.drugName then drugNum+1;

run;

 

FYI - 30k records is not a large dataset.  The number of observations doesn't change the approach in this situation, but the number of drugs will. 

 


@Student77 wrote:

Hello,

 

I have a large dataset (~30,000 obs).

 

I have a variable called prescriptions, and there are many many different options. In order to adjust for certain drugs in my model I need to convert these to numerical. I also need to get a count of how many of each drug there is to see if it is worth adjusting for anyway.

 

Is there a quick/easy way?

 

I would hate to go one by one:

if drug="atorvastatin" then numdrug=1;

if ...etc for hundreds of drugs.

 

Thank you


 

 

Student77
Obsidian | Level 7

what will this do:

retain drugNum 0;

if first.drugName then drugNum+1;

 

What I am trying to do is a cox proportional hazards model, and I may need to adjust for contraceptive use, hormone therapy, and menopause treatment--but in order to know if this is necessary I first need a count of how many of each drug there is, and then I need to group these.

 

ie group together all the types of contraceptives, and then count #contraceptives

same with hormone therapy, and the other variables I'll create.

 

I do not know exactly how many different values there are, but they are consistently named

Patrick
Opal | Level 21

@Student77 wrote:

what will this do:

retain drugNum 0;

if first.drugName then drugNum+1;

 


It creates a enumerator for your drugs which is what you've asked for. How this works is explained here and here.

 

Reeza
Super User
Sounds like you'll need to recode your drugs then. Do you have a master table of which drugs belong to which groups? If so, you can either join to that table or create a format (slightly more efficient) and then do your analysis.

You can also create that table manually if you don't have it to save yourself time. That's probably the recommend route at this point.

ChrisNZ
Tourmaline | Level 20

30,000 observations is a (very) small data set.

Without actual data examples, it's hard to give better code than @Reeza 's.

How do we know a drug is a contraceptive, and what should the output look like?

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 788 views
  • 0 likes
  • 4 in conversation