BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
zihdonv19
Quartz | Level 8

I have a dataset like this:

 

data medication;
infile datalines dsd truncover;
input id $ start : ddmmyy10. next : ddmmyy10. start_med $ next_med $;
format start next yymmdd10.; /* Change format to yymmdd10. */
datalines;
1,10/10/2006,09/09/2007,insulin,OAD
1,09/09/2007,10/10/2007,OAD,OAD
1,10/10/2007,16/01/2008,OAD,OAD
1,16/01/2008,06/08/2009,OAD,insulin
1,06/08/2009,30/08/2009,insulin,insulin
2,10/10/2006,09/09/2007,insulin,OAD
2,09/09/2007,10/10/2007,OAD,OAD
2,10/10/2007,16/01/2008,OAD,OAD
2,16/01/2008,19/01/2008,OAD,OAD
2,19/01/2008,06/08/2009,OAD,insulin
2,06/08/2009,30/08/2009,insulin,insulin
;
run;

 

For each id, I want to merge their date variables if start_med=end_med=end_med in next row. In this example, the desired dataset should be:

 

data medication1;
infile datalines dsd truncover;
input id $ start : ddmmyy10. next : ddmmyy10. start_med $ next_med $;
format start next yymmdd10.; /* Change format to yymmdd10. */
datalines;
1,10/10/2006,09/09/2007,insulin,OAD
1,09/09/2007,16/01/2008,OAD,OAD
1,16/01/2008,06/08/2009,OAD,insulin
1,06/08/2009,30/08/2009,insulin,insulin
2,10/10/2006,09/09/2007,insulin,OAD
2,09/09/2007,19/01/2008,OAD,OAD
2,19/01/2008,06/08/2009,OAD,insulin
2,06/08/2009,30/08/2009,insulin,insulin
;
run;

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

That is a strange starting point, but it does make the job easier.

data want ;
  set medication ;
  by id start_med notsorted;
  if first.start_med then first=start;
  retain first;
  if last.start_med;
  start=first;
  drop first;
run;

A more normal starting point would be to have observations that only know about the current medications and know nothing about what happened in the future.  So something like:

data have;
  length id $8 start end 8 med $10;
  informat start end yymmdd.; 
  format start end yymmdd10.; 
  input id -- med ;
datalines;
1 2006-10-10 2007-09-09 insulin
1 2007-09-09 2007-10-10 OAD
1 2007-10-10 2008-01-16 OAD
1 2008-01-16 2009-08-06 OAD
1 2009-08-06 2009-08-30 insulin
2 2006-10-10 2007-09-09 insulin
2 2007-09-09 2007-10-10 OAD
2 2007-10-10 2008-01-16 OAD
2 2008-01-16 2008-01-19 OAD
2 2008-01-19 2009-08-06 OAD
2 2009-08-06 2009-08-30 insulin
;

So that if you wanted to know whether the following observation indicates either a change in medication (or perhaps a gap with no medication) you would have to "look ahead".  Are you sure that is not the problem you need to solve?

View solution in original post

7 REPLIES 7
ballardw
Super User

Please make sure that your discussion/description/requirements use variables in data sets.

Your requirement: (besides being a little confusing as to intent) " merge their date variables if start_med=end_med=end_med in next row. " Is not possible with any of the example data as there are no variables start_med or end_med included in either data set. If we have to guess which variables and "rows" are intended results are often poor.

 

Assuming by "start_med" and "end_med" you meant "start" and "end", I see no observations in either data set where "start_med=end_med". So either example data is incomplete to provide a working example OR that requirement needs a lot of additional description. Perhaps those colored highlights have some meaning. If so, please describe it. Guessing as to meaning is a poor way to program.

 


@zihdonv19 wrote:

I have a dataset like this:

 

data medication;
infile datalines dsd truncover;
input id $ start : ddmmyy10. next : ddmmyy10. start_med $ next_med $;
format start next yymmdd10.; /* Change format to yymmdd10. */
datalines;
1,10/10/2006,09/09/2007,insulin,OAD
1,09/09/2007,10/10/2007,OAD,OAD
1,10/10/2007,16/01/2008,OAD,OAD
1,16/01/2008,06/08/2009,OAD,insulin
1,06/08/2009,30/08/2009,insulin,insulin
2,10/10/2006,09/09/2007,insulin,OAD
2,09/09/2007,10/10/2007,OAD,OAD
2,10/10/2007,16/01/2008,OAD,OAD
2,16/01/2008,19/01/2008,OAD,OAD
2,19/01/2008,06/08/2009,OAD,insulin
2,06/08/2009,30/08/2009,insulin,insulin
;
run;

 

For each id, I want to merge their date variables if start_med=end_med=end_med in next row. In this example, the desired dataset should be:

 

data medication1;
infile datalines dsd truncover;
input id $ start : ddmmyy10. next : ddmmyy10. start_med $ next_med $;
format start next yymmdd10.; /* Change format to yymmdd10. */
datalines;
1,10/10/2006,09/09/2007,insulin,OAD
1,09/09/2007,16/01/2008,OAD,OAD
1,16/01/2008,06/08/2009,OAD,insulin
1,06/08/2009,30/08/2009,insulin,insulin
2,10/10/2006,09/09/2007,insulin,OAD
2,09/09/2007,19/01/2008,OAD,OAD
2,19/01/2008,06/08/2009,OAD,insulin
2,06/08/2009,30/08/2009,insulin,insulin
;
run;


 

zihdonv19
Quartz | Level 8
Hi, sorry for the typo. I mean "start_med=next_med=next_med in next row".
A_Kh
Barite | Level 11

I would separate the required part of data using condition, then process it using BY/GROUP and set it back with the main data. It requires multiple data steps though. 

proc sort data=medication out=have;
	where start_med=next_med;
	by id start_med start next;
run; 
data have1; 
	set have;
	by id start_med start next;
	retain startdt;
	if first.start_med then startdt=start;
	if last.start_med then start=startdt;
	if last.start_med;
	drop startdt;
run; 
data want;
	set medication (where=(start_med^=next_med)) have1;
	by id;
run; 
proc sort data=want;
	by id start next;
run; 
Tom
Super User Tom
Super User

That is a strange starting point, but it does make the job easier.

data want ;
  set medication ;
  by id start_med notsorted;
  if first.start_med then first=start;
  retain first;
  if last.start_med;
  start=first;
  drop first;
run;

A more normal starting point would be to have observations that only know about the current medications and know nothing about what happened in the future.  So something like:

data have;
  length id $8 start end 8 med $10;
  informat start end yymmdd.; 
  format start end yymmdd10.; 
  input id -- med ;
datalines;
1 2006-10-10 2007-09-09 insulin
1 2007-09-09 2007-10-10 OAD
1 2007-10-10 2008-01-16 OAD
1 2008-01-16 2009-08-06 OAD
1 2009-08-06 2009-08-30 insulin
2 2006-10-10 2007-09-09 insulin
2 2007-09-09 2007-10-10 OAD
2 2007-10-10 2008-01-16 OAD
2 2008-01-16 2008-01-19 OAD
2 2008-01-19 2009-08-06 OAD
2 2009-08-06 2009-08-30 insulin
;

So that if you wanted to know whether the following observation indicates either a change in medication (or perhaps a gap with no medication) you would have to "look ahead".  Are you sure that is not the problem you need to solve?

zihdonv19
Quartz | Level 8

Hi!

 

The ultimate aim of this data cleaning is to find out treatment pattern of this population (as you said, medication change is one the important things that we want to capture). In this thread, I want to ask how to merge the medication records/corresponding prescription time for the same medications (e.g., OAD), so the dataset won't look so redundant.

 

In my data, the start_med and next_med are without missing values. 

 

Hope it clarifies a bit.

Ksharp
Super User

If I understood what you are looking for.

 

data medication;
infile datalines dsd truncover;
input id $ start : ddmmyy10. next : ddmmyy10. start_med $ next_med $;
format start next yymmdd10.; /* Change format to yymmdd10. */
datalines;
1,10/10/2006,09/09/2007,insulin,OAD
1,09/09/2007,10/10/2007,OAD,OAD
1,10/10/2007,16/01/2008,OAD,OAD
1,16/01/2008,06/08/2009,OAD,insulin
1,06/08/2009,30/08/2009,insulin,insulin
2,10/10/2006,09/09/2007,insulin,OAD
2,09/09/2007,10/10/2007,OAD,OAD
2,10/10/2007,16/01/2008,OAD,OAD
2,16/01/2008,19/01/2008,OAD,OAD
2,19/01/2008,06/08/2009,OAD,insulin
2,06/08/2009,30/08/2009,insulin,insulin
;
run;
data want;
 set medication;
 by id start_med  next_med notsorted;
 retain temp;
 if first.next_med then temp=start;
 if last.next_med then do;start=temp;output;end;
 drop temp;
run;
zihdonv19
Quartz | Level 8

Thank you. It works!

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 7 replies
  • 1748 views
  • 3 likes
  • 5 in conversation