Solved: Re: merge rows with condition

zihdonv19 · Posted 10-03-2024 12:13 PM

I have a dataset like this:

data medication;
infile datalines dsd truncover;
input id $ start : ddmmyy10. next : ddmmyy10. start_med $ next_med $;
format start next yymmdd10.; /* Change format to yymmdd10. */
datalines;
1,10/10/2006,09/09/2007,insulin,OAD
1,09/09/2007,10/10/2007,OAD,OAD
1,10/10/2007,16/01/2008,OAD,OAD
1,16/01/2008,06/08/2009,OAD,insulin
1,06/08/2009,30/08/2009,insulin,insulin
2,10/10/2006,09/09/2007,insulin,OAD
2,09/09/2007,10/10/2007,OAD,OAD
2,10/10/2007,16/01/2008,OAD,OAD
2,16/01/2008,19/01/2008,OAD,OAD
2,19/01/2008,06/08/2009,OAD,insulin
2,06/08/2009,30/08/2009,insulin,insulin
;
run;

For each id, I want to merge their date variables if start_med=end_med=end_med in next row. In this example, the desired dataset should be:

data medication1;
infile datalines dsd truncover;
input id $ start : ddmmyy10. next : ddmmyy10. start_med $ next_med $;
format start next yymmdd10.; /* Change format to yymmdd10. */
datalines;
1,10/10/2006,09/09/2007,insulin,OAD
1,09/09/2007,16/01/2008,OAD,OAD
1,16/01/2008,06/08/2009,OAD,insulin
1,06/08/2009,30/08/2009,insulin,insulin
2,10/10/2006,09/09/2007,insulin,OAD
2,09/09/2007,19/01/2008,OAD,OAD
2,19/01/2008,06/08/2009,OAD,insulin
2,06/08/2009,30/08/2009,insulin,insulin
;
run;

Tom · Posted 10-03-2024 01:34 PM

That is a strange starting point, but it does make the job easier.

data want ;
  set medication ;
  by id start_med notsorted;
  if first.start_med then first=start;
  retain first;
  if last.start_med;
  start=first;
  drop first;
run;

A more normal starting point would be to have observations that only know about the current medications and know nothing about what happened in the future. So something like:

data have;
  length id $8 start end 8 med $10;
  informat start end yymmdd.; 
  format start end yymmdd10.; 
  input id -- med ;
datalines;
1 2006-10-10 2007-09-09 insulin
1 2007-09-09 2007-10-10 OAD
1 2007-10-10 2008-01-16 OAD
1 2008-01-16 2009-08-06 OAD
1 2009-08-06 2009-08-30 insulin
2 2006-10-10 2007-09-09 insulin
2 2007-09-09 2007-10-10 OAD
2 2007-10-10 2008-01-16 OAD
2 2008-01-16 2008-01-19 OAD
2 2008-01-19 2009-08-06 OAD
2 2009-08-06 2009-08-30 insulin
;

So that if you wanted to know whether the following observation indicates either a change in medication (or perhaps a gap with no medication) you would have to "look ahead". Are you sure that is not the problem you need to solve?

View solution in original post

ballardw · Posted 10-03-2024 12:52 PM

Please make sure that your discussion/description/requirements use variables in data sets.

Your requirement: (besides being a little confusing as to intent) " merge their date variables if start_med=end_med=end_med in next row. " Is not possible with any of the example data as there are no variables start_med or end_med included in either data set. If we have to guess which variables and "rows" are intended results are often poor.

Assuming by "start_med" and "end_med" you meant "start" and "end", I see no observations in either data set where "start_med=end_med". So either example data is incomplete to provide a working example OR that requirement needs a lot of additional description. Perhaps those colored highlights have some meaning. If so, please describe it. Guessing as to meaning is a poor way to program.

@zihdonv19 wrote:

I have a dataset like this:

data medication;
infile datalines dsd truncover;
input id $ start : ddmmyy10. next : ddmmyy10. start_med $ next_med $;
format start next yymmdd10.; /* Change format to yymmdd10. */
datalines;
1,10/10/2006,09/09/2007,insulin,OAD
1,09/09/2007,10/10/2007,OAD,OAD
1,10/10/2007,16/01/2008,OAD,OAD
1,16/01/2008,06/08/2009,OAD,insulin
1,06/08/2009,30/08/2009,insulin,insulin
2,10/10/2006,09/09/2007,insulin,OAD
2,09/09/2007,10/10/2007,OAD,OAD
2,10/10/2007,16/01/2008,OAD,OAD
2,16/01/2008,19/01/2008,OAD,OAD
2,19/01/2008,06/08/2009,OAD,insulin
2,06/08/2009,30/08/2009,insulin,insulin
;
run;

For each id, I want to merge their date variables if start_med=end_med=end_med in next row. In this example, the desired dataset should be:

data medication1;
infile datalines dsd truncover;
input id $ start : ddmmyy10. next : ddmmyy10. start_med $ next_med $;
format start next yymmdd10.; /* Change format to yymmdd10. */
datalines;
1,10/10/2006,09/09/2007,insulin,OAD
1,09/09/2007,16/01/2008,OAD,OAD
1,16/01/2008,06/08/2009,OAD,insulin
1,06/08/2009,30/08/2009,insulin,insulin
2,10/10/2006,09/09/2007,insulin,OAD
2,09/09/2007,19/01/2008,OAD,OAD
2,19/01/2008,06/08/2009,OAD,insulin
2,06/08/2009,30/08/2009,insulin,insulin
;
run;

zihdonv19 · Posted 10-03-2024 12:56 PM

Hi, sorry for the typo. I mean "start_med=next_med=next_med in next row".

A_Kh · Posted 10-03-2024 01:05 PM

I would separate the required part of data using condition, then process it using BY/GROUP and set it back with the main data. It requires multiple data steps though.

proc sort data=medication out=have;
	where start_med=next_med;
	by id start_med start next;
run; 
data have1; 
	set have;
	by id start_med start next;
	retain startdt;
	if first.start_med then startdt=start;
	if last.start_med then start=startdt;
	if last.start_med;
	drop startdt;
run; 
data want;
	set medication (where=(start_med^=next_med)) have1;
	by id;
run; 
proc sort data=want;
	by id start next;
run;

Tom · Posted 10-03-2024 01:34 PM

That is a strange starting point, but it does make the job easier.

data want ;
  set medication ;
  by id start_med notsorted;
  if first.start_med then first=start;
  retain first;
  if last.start_med;
  start=first;
  drop first;
run;

A more normal starting point would be to have observations that only know about the current medications and know nothing about what happened in the future. So something like:

data have;
  length id $8 start end 8 med $10;
  informat start end yymmdd.; 
  format start end yymmdd10.; 
  input id -- med ;
datalines;
1 2006-10-10 2007-09-09 insulin
1 2007-09-09 2007-10-10 OAD
1 2007-10-10 2008-01-16 OAD
1 2008-01-16 2009-08-06 OAD
1 2009-08-06 2009-08-30 insulin
2 2006-10-10 2007-09-09 insulin
2 2007-09-09 2007-10-10 OAD
2 2007-10-10 2008-01-16 OAD
2 2008-01-16 2008-01-19 OAD
2 2008-01-19 2009-08-06 OAD
2 2009-08-06 2009-08-30 insulin
;

So that if you wanted to know whether the following observation indicates either a change in medication (or perhaps a gap with no medication) you would have to "look ahead". Are you sure that is not the problem you need to solve?

zihdonv19 · Posted 10-03-2024 01:45 PM

Hi!

The ultimate aim of this data cleaning is to find out treatment pattern of this population (as you said, medication change is one the important things that we want to capture). In this thread, I want to ask how to merge the medication records/corresponding prescription time for the same medications (e.g., OAD), so the dataset won't look so redundant.

In my data, the start_med and next_med are without missing values.

Hope it clarifies a bit.

Ksharp · Posted 10-03-2024 10:24 PM

If I understood what you are looking for.

data medication;
infile datalines dsd truncover;
input id $ start : ddmmyy10. next : ddmmyy10. start_med $ next_med $;
format start next yymmdd10.; /* Change format to yymmdd10. */
datalines;
1,10/10/2006,09/09/2007,insulin,OAD
1,09/09/2007,10/10/2007,OAD,OAD
1,10/10/2007,16/01/2008,OAD,OAD
1,16/01/2008,06/08/2009,OAD,insulin
1,06/08/2009,30/08/2009,insulin,insulin
2,10/10/2006,09/09/2007,insulin,OAD
2,09/09/2007,10/10/2007,OAD,OAD
2,10/10/2007,16/01/2008,OAD,OAD
2,16/01/2008,19/01/2008,OAD,OAD
2,19/01/2008,06/08/2009,OAD,insulin
2,06/08/2009,30/08/2009,insulin,insulin
;
run;
data want;
 set medication;
 by id start_med  next_med notsorted;
 retain temp;
 if first.next_med then temp=start;
 if last.next_med then do;start=temp;output;end;
 drop temp;
run;

zihdonv19 · Posted 10-04-2024 05:28 AM

Thank you. It works!

Registration is open

SAS Training: Just a Click Away