BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
mlas
Calcite | Level 5

Hi guys,

I have a dataset of medication pick-up dates for a list of patients that I am trying to reformat. There are a total of 4 different drugs in the dataset. Each line is a unique person (key variable = id), each time they pick up a new drug at the pharmacy a new drugX_startdate and a corresponding drugX_enddate is added in wide format to the dataset. Sometimes the drug periods overlap (e.g. drugB_startdate1 is before drugA_enddate1). Most of the patients are only on one of the 4 drugs and will just have missing data on the other drugs. I want to get rid of the drug names and just get them all sorted as drug_start1-etc drug_end1-etc such that all patients in the dataset have a drug_start1 corresponding to whichever drug they start on. It varies from 0 to 45 on the number of pick-ups for any of the 4 drugs.

 

The dataset is currently in the format:

id drugA_start1 drugA_start2 drugA_start3 drugA_end1 drugA_end2 drugA_end3 drugB_start1 drugB_end1 etc

1.  01JAN2000  01FEB2000  01MAR2000  10JAN2000  10FEB2000 18MAR2000 02APR2000   20APR2000

2.  01AUG2001  01SEP2001           .          20AUG2001  18SEP2001           .          01JAN2001   31JAN2001

3.           .                     .                    .                    .                      .                  .          01JAN2002   15FEB2002

4.  01JAN2003  01FEB2003  01MAR2003  10JAN2003  10FEB2003 18MAR2003            .                    .

Only there are many more startdates and enddates for each drug and there are a total of 4 different drugs.

 

I want to reformat the dataset into the following chronological order and get rid of the drug-specific prefix but such that drug_stat(x) still corresponds to drug_end(x):

id   drug_start1   drug_start2   drug_start3   drug_start4  (etc) drug_end1  drug_end2   drug_end3   drug_end4 (etc)

1.  01JAN2000  01FEB2000  01MAR2000  02APR2000        10JAN2000 10FEB2000 18MAR2000 20APR2000

2.  01JAN2001  01AUG2001  01SEP2001            .                 31JAN2001 20AUG2001 18SEP2001          .       

3.  01JAN2002           .                      .                    .                 15FEB2001          .                  .                     .

4.  01JAN2003  01FEB2003  01MAR2003            .                 10JAN2003  10FEB2003  18MAR2003        .

 

I hope this makes sense 🙂 In the end I need to calculate the proportion of time on any medication as the number of days on medication/total days in period which is why I need the data like this. Can anyone give me some guidance on how to reformat the data like this? Thanks in advance!

1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User
Change your data structure to a long format, order as desired and then if you really want (do not recommend) reflip it to a wide format again.

Format recommended:

ID DRUG StartDate EndDate

I would recommend a data step transpose in this case.
https://stats.oarc.ucla.edu/sas/modules/reshaping-data-wide-to-long-using-a-data-step/

Use the VNAME function to get the variable names/drug information.

View solution in original post

3 REPLIES 3
Reeza
Super User
Change your data structure to a long format, order as desired and then if you really want (do not recommend) reflip it to a wide format again.

Format recommended:

ID DRUG StartDate EndDate

I would recommend a data step transpose in this case.
https://stats.oarc.ucla.edu/sas/modules/reshaping-data-wide-to-long-using-a-data-step/

Use the VNAME function to get the variable names/drug information.

mlas
Calcite | Level 5

Thank you Reeza! I am having difficulties creating the code for transposing to long and get the drugdates into a single column. Would it be possible for you to share some code on how to transpose it and how to use VNAME to get the drug names into a separate column? Thanks in advance!

Reeza
Super User

In the future please show what you've tried.

 

data wide;
	set have;
	array _start(*) drugA_start: drugB_start:;
	array _end(*) drugA_end: drugB_end:;

	do i=1 to dim(_start);
		start_date=_start(i);
		end_date=_end(i);
		Drug=scan(vname(_start(i)), 1, '_');
		output;
		*conditionally output for non misisng but you need to check your data quality;
		*if nmiss(start_date, end_date) = 0 then output;
	end;
	keep ID drug start_date end_date;
	format start_date end_date date9.;
run;

sas-innovate-white.png

Missed SAS Innovate in Orlando?

Catch the best of SAS Innovate 2025 — anytime, anywhere. Stream powerful keynotes, real-world demos, and game-changing insights from the world’s leading data and AI minds.

 

Register now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 3 replies
  • 933 views
  • 3 likes
  • 2 in conversation