Hi guys,
I have a dataset of medication pick-up dates for a list of patients that I am trying to reformat. There are a total of 4 different drugs in the dataset. Each line is a unique person (key variable = id), each time they pick up a new drug at the pharmacy a new drugX_startdate and a corresponding drugX_enddate is added in wide format to the dataset. Sometimes the drug periods overlap (e.g. drugB_startdate1 is before drugA_enddate1). Most of the patients are only on one of the 4 drugs and will just have missing data on the other drugs. I want to get rid of the drug names and just get them all sorted as drug_start1-etc drug_end1-etc such that all patients in the dataset have a drug_start1 corresponding to whichever drug they start on. It varies from 0 to 45 on the number of pick-ups for any of the 4 drugs.
The dataset is currently in the format:
id drugA_start1 drugA_start2 drugA_start3 drugA_end1 drugA_end2 drugA_end3 drugB_start1 drugB_end1 etc
1. 01JAN2000 01FEB2000 01MAR2000 10JAN2000 10FEB2000 18MAR2000 02APR2000 20APR2000
2. 01AUG2001 01SEP2001 . 20AUG2001 18SEP2001 . 01JAN2001 31JAN2001
3. . . . . . . 01JAN2002 15FEB2002
4. 01JAN2003 01FEB2003 01MAR2003 10JAN2003 10FEB2003 18MAR2003 . .
Only there are many more startdates and enddates for each drug and there are a total of 4 different drugs.
I want to reformat the dataset into the following chronological order and get rid of the drug-specific prefix but such that drug_stat(x) still corresponds to drug_end(x):
id drug_start1 drug_start2 drug_start3 drug_start4 (etc) drug_end1 drug_end2 drug_end3 drug_end4 (etc)
1. 01JAN2000 01FEB2000 01MAR2000 02APR2000 10JAN2000 10FEB2000 18MAR2000 20APR2000
2. 01JAN2001 01AUG2001 01SEP2001 . 31JAN2001 20AUG2001 18SEP2001 .
3. 01JAN2002 . . . 15FEB2001 . . .
4. 01JAN2003 01FEB2003 01MAR2003 . 10JAN2003 10FEB2003 18MAR2003 .
I hope this makes sense 🙂 In the end I need to calculate the proportion of time on any medication as the number of days on medication/total days in period which is why I need the data like this. Can anyone give me some guidance on how to reformat the data like this? Thanks in advance!
Thank you Reeza! I am having difficulties creating the code for transposing to long and get the drugdates into a single column. Would it be possible for you to share some code on how to transpose it and how to use VNAME to get the drug names into a separate column? Thanks in advance!
In the future please show what you've tried.
data wide;
set have;
array _start(*) drugA_start: drugB_start:;
array _end(*) drugA_end: drugB_end:;
do i=1 to dim(_start);
start_date=_start(i);
end_date=_end(i);
Drug=scan(vname(_start(i)), 1, '_');
output;
*conditionally output for non misisng but you need to check your data quality;
*if nmiss(start_date, end_date) = 0 then output;
end;
keep ID drug start_date end_date;
format start_date end_date date9.;
run;
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.