BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
mlas
Calcite | Level 5

Hi guys,

I have a dataset of medication pick-up dates for a list of patients that I am trying to reformat. There are a total of 4 different drugs in the dataset. Each line is a unique person (key variable = id), each time they pick up a new drug at the pharmacy a new drugX_startdate and a corresponding drugX_enddate is added in wide format to the dataset. Sometimes the drug periods overlap (e.g. drugB_startdate1 is before drugA_enddate1). Most of the patients are only on one of the 4 drugs and will just have missing data on the other drugs. I want to get rid of the drug names and just get them all sorted as drug_start1-etc drug_end1-etc such that all patients in the dataset have a drug_start1 corresponding to whichever drug they start on. It varies from 0 to 45 on the number of pick-ups for any of the 4 drugs.

 

The dataset is currently in the format:

id drugA_start1 drugA_start2 drugA_start3 drugA_end1 drugA_end2 drugA_end3 drugB_start1 drugB_end1 etc

1.  01JAN2000  01FEB2000  01MAR2000  10JAN2000  10FEB2000 18MAR2000 02APR2000   20APR2000

2.  01AUG2001  01SEP2001           .          20AUG2001  18SEP2001           .          01JAN2001   31JAN2001

3.           .                     .                    .                    .                      .                  .          01JAN2002   15FEB2002

4.  01JAN2003  01FEB2003  01MAR2003  10JAN2003  10FEB2003 18MAR2003            .                    .

Only there are many more startdates and enddates for each drug and there are a total of 4 different drugs.

 

I want to reformat the dataset into the following chronological order and get rid of the drug-specific prefix but such that drug_stat(x) still corresponds to drug_end(x):

id   drug_start1   drug_start2   drug_start3   drug_start4  (etc) drug_end1  drug_end2   drug_end3   drug_end4 (etc)

1.  01JAN2000  01FEB2000  01MAR2000  02APR2000        10JAN2000 10FEB2000 18MAR2000 20APR2000

2.  01JAN2001  01AUG2001  01SEP2001            .                 31JAN2001 20AUG2001 18SEP2001          .       

3.  01JAN2002           .                      .                    .                 15FEB2001          .                  .                     .

4.  01JAN2003  01FEB2003  01MAR2003            .                 10JAN2003  10FEB2003  18MAR2003        .

 

I hope this makes sense 🙂 In the end I need to calculate the proportion of time on any medication as the number of days on medication/total days in period which is why I need the data like this. Can anyone give me some guidance on how to reformat the data like this? Thanks in advance!

1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User
Change your data structure to a long format, order as desired and then if you really want (do not recommend) reflip it to a wide format again.

Format recommended:

ID DRUG StartDate EndDate

I would recommend a data step transpose in this case.
https://stats.oarc.ucla.edu/sas/modules/reshaping-data-wide-to-long-using-a-data-step/

Use the VNAME function to get the variable names/drug information.

View solution in original post

3 REPLIES 3
Reeza
Super User
Change your data structure to a long format, order as desired and then if you really want (do not recommend) reflip it to a wide format again.

Format recommended:

ID DRUG StartDate EndDate

I would recommend a data step transpose in this case.
https://stats.oarc.ucla.edu/sas/modules/reshaping-data-wide-to-long-using-a-data-step/

Use the VNAME function to get the variable names/drug information.

mlas
Calcite | Level 5

Thank you Reeza! I am having difficulties creating the code for transposing to long and get the drugdates into a single column. Would it be possible for you to share some code on how to transpose it and how to use VNAME to get the drug names into a separate column? Thanks in advance!

Reeza
Super User

In the future please show what you've tried.

 

data wide;
	set have;
	array _start(*) drugA_start: drugB_start:;
	array _end(*) drugA_end: drugB_end:;

	do i=1 to dim(_start);
		start_date=_start(i);
		end_date=_end(i);
		Drug=scan(vname(_start(i)), 1, '_');
		output;
		*conditionally output for non misisng but you need to check your data quality;
		*if nmiss(start_date, end_date) = 0 then output;
	end;
	keep ID drug start_date end_date;
	format start_date end_date date9.;
run;

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 3 replies
  • 1096 views
  • 3 likes
  • 2 in conversation