Hi, I am using SAS 9.4. I have wide data of all the drugs used by patients that looks like this:
ID | Drug1 | Drug2 | Drug3 | Drug4 | Drug5 | Drug6 | Drug7 |
1 | Ceftazidme | Ciprofloxacin | |||||
2 | Ceftazidme | Colistin | Vancomycin | ||||
3 | Cefazolin | Ceftazidme | Meropenem | ||||
4 | Cefepime | Minocycline | Colistin | ||||
… |
I would like to be able to tell how many people used each type of drug no matter which column the drugs are in, which is hard to do with the way my data are now (e.g., I want to know that 3 people used Ceftazidme, 2 people used Colistin, etc).
I would like my data to look like this:
ID | Ceftazidme | Ciprofloxacin | Colistin | Vancomycin | Cefazolin | Meropenem | Cefepime | Minocycline | … | Drug45 |
1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | ||
2 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | ||
3 | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | ||
4 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 1 | ||
… |
Although each ID had a maximum of 7 drugs, there are 45 unique drug names among all the IDs, so it's a little unwieldy to write out arrays for each drug name.
Any ideas would be great! Thanks in advance.
Try this
data have;
input ID (Drug1 - Drug7)(:$20.);
infile datalines missover;
datalines;
1 Ceftazidme Ciprofloxacin
2 Ceftazidme Colistin Vancomycin
3 Cefazolin Ceftazidme Meropenem
4 Cefepime Minocycline Colistin
;
data temp(keep = ID v drug);
set have;
array d Drug1 - Drug7;
do over d;
drug = d;
v = 1;
if drug ne '' then output;
end;
run;
proc transpose data = temp out = want(drop = _:);
by ID;
id drug;
var v;
run;
Try this
data have;
input ID (Drug1 - Drug7)(:$20.);
infile datalines missover;
datalines;
1 Ceftazidme Ciprofloxacin
2 Ceftazidme Colistin Vancomycin
3 Cefazolin Ceftazidme Meropenem
4 Cefepime Minocycline Colistin
;
data temp(keep = ID v drug);
set have;
array d Drug1 - Drug7;
do over d;
drug = d;
v = 1;
if drug ne '' then output;
end;
run;
proc transpose data = temp out = want(drop = _:);
by ID;
id drug;
var v;
run;
I would argue you actually don't want your data in your binary column arrangement.
Add a new drug and you have to add a new column.
I would argue more normalized data like:
ID | Drug | Date or dosage or whatever |
1 | Ceftazidme | |
1 | Ciprofloxacin | |
2 | Ceftazidme | |
2 | Colistin | |
2 | Vancomycin | |
3 | Cefazolin | |
3 | Ceftazidme | |
3 | Meropenem | |
4 | Minocycline | |
4 | Colistin |
which is essentially the format of the TEMP table in the @PeterClemmensen code would you do you more favors in the long run in terms of querying.
proc sql;
select drug, count(id) as "Number of IDs Taking Drug"n
from temp
group by drug;
quit;
drug | Number of IDs Taking Drug |
Cefazolin | 1 |
Cefepime | 1 |
Ceftazidme | 3 |
Ciprofloxacin | 1 |
Colistin | 2 |
Meropenem | 1 |
Minocycline | 1 |
Vancomycin | 1 |
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.