Solved: Pick up values from multiple rows of same ID if meeting criteria

Solph · Posted 12-27-2022 01:55 PM

I've a data consisting of a list of IDs, and within each ID are episode IDs (EPID) with values of start date and end date and other data

elements (e.g. days on medication and drug name). I want to consolidate episodes if consecutive records (meaning end-start of the next record is within (0,1) values - some are from the first record (start), some from the last record (End, Drug), some (DrugDays) are from sum of all episodes.

The data input lines below shows

HAVE data from ID to Drug

Seq is to show these records should be consolidated

EPID_1 to DRUG_1 are WANT value (if records have empty values they will be removed from output of course).

I guess if I'm able to create the SEQ variable, the rest is easy. Any help is appreciated.

data have; format ID EPID Start End Drugdays Drug SEQ EPID_1 Start_1 End_1 DrugDays_1 Drug_1; drop startDT EndDT StartDT_1 EndDT_1;
format start end start_1 end_1 yymmdd10.;
input ID 1 EPID 3-5 StartDT $ 7-16  EndDT $ 18-27
DrugDays 29-30 Drug $ 32 Seq 34
EPID_1 36-38  StartDT_1 $ 40-49 EndDT_1 $ 51-60
DrugDays_1 62-63 Drug_1 $ 65 ;
Start=input(StartDT,yymmdd10.);
END=input(ENDDT,yymmdd10.);
Start_1=input(StartDT_1,yymmdd10.);
END_1=input(ENDDT_1,yymmdd10.);
datalines;
1 101 2018-05-24 2018-05-31  7 A 1 101 2018-05-24 2018-10-05 30 C
1 102 2018-05-31 2018-06-08  8 B 1                               
1 103 2018-06-08 2018-10-05 15 C 1                               
2 201 2019-04-09 2019-04-11  2 A 1 201 2019-04-09 2019-04-27 15 B
2 202 2019-04-12 2019-04-17  3 C 1                               
2 203 2019-04-17 2019-04-25  5 B 1                               
2 204 2019-04-25 2019-04-27  5 B 1                               
2 205 2019-04-30 2019-05-07  6 A 2 205 2019-04-30 2019-05-10 12 C
2 206 2019-05-08 2019-05-10  6 C 2 
2 207 2019-06-15 2019-06-30  5 A 3 207 2019-06-15 2019-06-30  5 A
3 301 2021-07-14 2021-07-15  1 C 1 301 2021-07-14 2021-07-19  4 A
3 302 2021-07-15 2021-07-19  3 A 1                               
4 401 2017-11-02 2017-11-19 10 C 1 401 2017-11-02 2017-11-19 10 C
4 402 2019-07-26 2019-07-27  1 B 2 402 2019-07-26 2019-11-11 46 C
4 403 2019-07-27 2019-08-02  5 A 2                               
4 404 2019-08-02 2019-11-11 40 C 2                               
5 501 2018-04-22 2018-04-23  1 A 1 501 2018-04-22 2018-04-25  3 B
5 502 2018-04-23 2018-04-25  2 B 1                               
5 503 2019-08-08 2019-08-10  1 C 2 503 2019-08-08 2019-08-15  4 A
5 504 2019-08-10 2019-08-15  3 A 2                               
6 601 2017-02-04 2017-02-14  5 C 1 601 2017-02-04 2017-02-26 15 A
6 602 2017-02-14 2017-02-26 10 A 1                               
6 603 2019-11-24 2019-11-30  1 C 2 603 2019-11-24 2019-12-28  4 B
6 604 2019-11-30 2019-12-28  3 B 2                               
7 701 2019-10-20 2019-11-10 10 A 1 701 2019-10-20 2019-11-10 10 A
;

Solph · Posted 12-27-2022 10:44 PM

There was a typo in the reading data column, so it's fixed and a couple more records were added to ID=2.

Took me a while to search similar codes that work for my needs. So the code below, though not neat, works.

proc sort data=have; by ID start; run;

data tmp; format seq_1; set have;
	by ID;
	flag1=(start-lag(end) in (0,1));
	flag2=( (start-lag(end) in (0,1)) and (lag(start)-lag2(end) in (0,1)));
	if flag1=1 and flag2=0 then flag2=1;
	if first.ID then flag1=1;
	if first.ID then flag2=1;

	if first.ID then seq_1=1;
	if lag(flag2)>flag2 then seq_1+1;

	drop flag1 flag2;
run;
proc sort data=tmp; by ID Seq_1 start ; run;

data first last; set tmp;
	by ID Seq_1 start;
	if first.seq_1 then output first;
	if last.seq_1 then output last;
run;
proc sql; create table want as
	select a.ID, a.EPID as EPID_1, a.Start as Start_1, b.End as End_1, c.drugdays as DrugDays_1,
	b.drug as Drug_1 
	from first as a 
	left join last as b on a.id=b.id and a.seq_1=b.seq_1
	left join (select ID, Seq_1, sum(Drugdays) as Drugdays from tmp group by ID, Seq_1) as c on a.id=c.id and a.seq_1=c.seq_1;
quit;

To compare,

data have2; set have; if start_1>.; run;
proc print data=have2 noobs; run;
proc print data=want noobs; run;

proc compare base=have2 compare=want; run;

or the final output form WANT:

ID	EPID_1	Start_1	End_1	DrugDays_1	Drug_1
2	201	2019-04-09	2019-04-27	15	B
2	205	2019-04-30	2019-05-10	12	C
2	207	2019-06-15	2019-06-30	5	A
3	301	2021-07-14	2021-07-19	4	A
4	401	2017-11-02	2017-11-19	10	C
4	402	2019-07-26	2019-11-11	46	C
5	501	2018-04-22	2018-04-25	3	B
5	503	2019-08-08	2019-08-15	4	A
6	601	2017-02-04	2017-02-26	15	A
6	603	2019-11-24	2019-12-28	4	B
7	701	2019-10-20	2019-11-10	10	A

View solution in original post

Solph · Posted 12-27-2022 02:06 PM

Patrick · Posted 12-27-2022 06:22 PM

Really appreciated that you provided sample data via a fully working SAS data step and that you also show us the desired result.

Looking at your data I tried to figure out the logic how you would derive the value for SEQ and Drug_1 - but to no avail.

Can you please explain the logic that leads to drug_1 being C in the first two rows below? I guess that's then what leads to SEQ having a value of 2 in the second row.
Can you please also explain why the third and fourth row would be excluded?

Solph · Posted 12-27-2022 10:44 PM

There was a typo in the reading data column, so it's fixed and a couple more records were added to ID=2.

Took me a while to search similar codes that work for my needs. So the code below, though not neat, works.

proc sort data=have; by ID start; run;

data tmp; format seq_1; set have;
	by ID;
	flag1=(start-lag(end) in (0,1));
	flag2=( (start-lag(end) in (0,1)) and (lag(start)-lag2(end) in (0,1)));
	if flag1=1 and flag2=0 then flag2=1;
	if first.ID then flag1=1;
	if first.ID then flag2=1;

	if first.ID then seq_1=1;
	if lag(flag2)>flag2 then seq_1+1;

	drop flag1 flag2;
run;
proc sort data=tmp; by ID Seq_1 start ; run;

data first last; set tmp;
	by ID Seq_1 start;
	if first.seq_1 then output first;
	if last.seq_1 then output last;
run;
proc sql; create table want as
	select a.ID, a.EPID as EPID_1, a.Start as Start_1, b.End as End_1, c.drugdays as DrugDays_1,
	b.drug as Drug_1 
	from first as a 
	left join last as b on a.id=b.id and a.seq_1=b.seq_1
	left join (select ID, Seq_1, sum(Drugdays) as Drugdays from tmp group by ID, Seq_1) as c on a.id=c.id and a.seq_1=c.seq_1;
quit;

To compare,

data have2; set have; if start_1>.; run;
proc print data=have2 noobs; run;
proc print data=want noobs; run;

proc compare base=have2 compare=want; run;

or the final output form WANT:

ID	EPID_1	Start_1	End_1	DrugDays_1	Drug_1
2	201	2019-04-09	2019-04-27	15	B
2	205	2019-04-30	2019-05-10	12	C
2	207	2019-06-15	2019-06-30	5	A
3	301	2021-07-14	2021-07-19	4	A
4	401	2017-11-02	2017-11-19	10	C
4	402	2019-07-26	2019-11-11	46	C
5	501	2018-04-22	2018-04-25	3	B
5	503	2019-08-08	2019-08-15	4	A
6	601	2017-02-04	2017-02-26	15	A
6	603	2019-11-24	2019-12-28	4	B
7	701	2019-10-20	2019-11-10	10	A

gema · Posted 01-09-2023 01:50 PM

can you create first dataset if first.id and rest dataset else if not first.id. merge those two data sets by id if diff(any of the dates) is <= 1)

Pick up values from multiple rows of same ID if meeting criteria

Re: Pick up values from multiple rows of same ID if meeting criteria

Pick up values from multiple rows of same ID if meeting criteria

Re: Pick up values from multiple rows of same ID if meeting criteria

Re: Pick up values from multiple rows of same ID if meeting criteria

Re: Pick up values from multiple rows of same ID if meeting criteria

Pick up values from multiple rows of same ID if meeting criteria

Re: Pick up values from multiple rows of same ID if meeting criteria

Pick up values from multiple rows of same ID if meeting criteria

Re: Pick up values from multiple rows of same ID if meeting criteria

Re: Pick up values from multiple rows of same ID if meeting criteria

Re: Pick up values from multiple rows of same ID if meeting criteria

Ready to join fellow brilliant minds for the SAS Hackathon?

Click image to register for webinar

Classroom Training Available!