Hi guys,
I have a dataset of medication pick-up dates for a list of patients that I am trying to reformat. There are a total of 4 different drugs in the dataset. Each line is a unique person (key variable = id), each time they pick up a new drug at the pharmacy a new drugX_startdate and a corresponding drugX_enddate is added in wide format to the dataset. Sometimes the drug periods overlap (e.g. drugB_startdate1 is before drugA_enddate1). Most of the patients are only on one of the 4 drugs and will just have missing data on the other drugs. I want to get rid of the drug names and just get them all sorted as drug_start1-etc drug_end1-etc such that all patients in the dataset have a drug_start1 corresponding to whichever drug they start on. It varies from 0 to 45 on the number of pick-ups for any of the 4 drugs.
The dataset is currently in the format:
id drugA_start1 drugA_start2 drugA_start3 drugA_end1 drugA_end2 drugA_end3 drugB_start1 drugB_end1 etc
1. 01JAN2000 01FEB2000 01MAR2000 10JAN2000 10FEB2000 18MAR2000 02APR2000 20APR2000
2. 01AUG2001 01SEP2001 . 20AUG2001 18SEP2001 . 01JAN2001 31JAN2001
3. . . . . . . 01JAN2002 15FEB2002
4. 01JAN2003 01FEB2003 01MAR2003 10JAN2003 10FEB2003 18MAR2003 . .
Only there are many more startdates and enddates for each drug and there are a total of 4 different drugs.
I want to reformat the dataset into the following chronological order and get rid of the drug-specific prefix but such that drug_stat(x) still corresponds to drug_end(x):
id drug_start1 drug_start2 drug_start3 drug_start4 (etc) drug_end1 drug_end2 drug_end3 drug_end4 (etc)
1. 01JAN2000 01FEB2000 01MAR2000 02APR2000 10JAN2000 10FEB2000 18MAR2000 20APR2000
2. 01JAN2001 01AUG2001 01SEP2001 . 31JAN2001 20AUG2001 18SEP2001 .
3. 01JAN2002 . . . 15FEB2001 . . .
4. 01JAN2003 01FEB2003 01MAR2003 . 10JAN2003 10FEB2003 18MAR2003 .
I hope this makes sense 🙂 In the end I need to calculate the proportion of time on any medication as the number of days on medication/total days in period which is why I need the data like this. Can anyone give me some guidance on how to reformat the data like this? Thanks in advance!
Thank you Reeza! I am having difficulties creating the code for transposing to long and get the drugdates into a single column. Would it be possible for you to share some code on how to transpose it and how to use VNAME to get the drug names into a separate column? Thanks in advance!
In the future please show what you've tried.
data wide;
set have;
array _start(*) drugA_start: drugB_start:;
array _end(*) drugA_end: drugB_end:;
do i=1 to dim(_start);
start_date=_start(i);
end_date=_end(i);
Drug=scan(vname(_start(i)), 1, '_');
output;
*conditionally output for non misisng but you need to check your data quality;
*if nmiss(start_date, end_date) = 0 then output;
end;
keep ID drug start_date end_date;
format start_date end_date date9.;
run;
Catch the best of SAS Innovate 2025 — anytime, anywhere. Stream powerful keynotes, real-world demos, and game-changing insights from the world’s leading data and AI minds.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.