BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
Solph
Pyrite | Level 9

I've a data consisting of a list of IDs, and within each ID are episode IDs (EPID) with values of start date and end date and other data 

elements (e.g. days on medication and drug name). I want to consolidate episodes if consecutive records (meaning end-start of the next record is within (0,1) values - some are from the first record (start), some from the last record (End, Drug), some (DrugDays) are from sum of all episodes.

 

The data input lines below shows

HAVE data from ID to Drug  

Seq is to show these records should be consolidated

EPID_1 to DRUG_1 are WANT value (if records have empty values they will be removed from output of course).

 

I guess if I'm able to create the SEQ variable, the rest is easy. Any help is appreciated.

 

data have; format ID EPID Start End Drugdays Drug SEQ EPID_1 Start_1 End_1 DrugDays_1 Drug_1; drop startDT EndDT StartDT_1 EndDT_1;
format start end start_1 end_1 yymmdd10.;
input ID 1 EPID 3-5 StartDT $ 7-16  EndDT $ 18-27
DrugDays 29-30 Drug $ 32 Seq 34
EPID_1 36-38  StartDT_1 $ 40-49 EndDT_1 $ 51-60
DrugDays_1 62-63 Drug_1 $ 65 ;
Start=input(StartDT,yymmdd10.);
END=input(ENDDT,yymmdd10.);
Start_1=input(StartDT_1,yymmdd10.);
END_1=input(ENDDT_1,yymmdd10.);
datalines;
1 101 2018-05-24 2018-05-31  7 A 1 101 2018-05-24 2018-10-05 30 C
1 102 2018-05-31 2018-06-08  8 B 1                               
1 103 2018-06-08 2018-10-05 15 C 1                               
2 201 2019-04-09 2019-04-11  2 A 1 201 2019-04-09 2019-04-27 15 B
2 202 2019-04-12 2019-04-17  3 C 1                               
2 203 2019-04-17 2019-04-25  5 B 1                               
2 204 2019-04-25 2019-04-27  5 B 1                               
2 205 2019-04-30 2019-05-07  6 A 2 205 2019-04-30 2019-05-10 12 C
2 206 2019-05-08 2019-05-10  6 C 2 
2 207 2019-06-15 2019-06-30  5 A 3 207 2019-06-15 2019-06-30  5 A
3 301 2021-07-14 2021-07-15  1 C 1 301 2021-07-14 2021-07-19  4 A
3 302 2021-07-15 2021-07-19  3 A 1                               
4 401 2017-11-02 2017-11-19 10 C 1 401 2017-11-02 2017-11-19 10 C
4 402 2019-07-26 2019-07-27  1 B 2 402 2019-07-26 2019-11-11 46 C
4 403 2019-07-27 2019-08-02  5 A 2                               
4 404 2019-08-02 2019-11-11 40 C 2                               
5 501 2018-04-22 2018-04-23  1 A 1 501 2018-04-22 2018-04-25  3 B
5 502 2018-04-23 2018-04-25  2 B 1                               
5 503 2019-08-08 2019-08-10  1 C 2 503 2019-08-08 2019-08-15  4 A
5 504 2019-08-10 2019-08-15  3 A 2                               
6 601 2017-02-04 2017-02-14  5 C 1 601 2017-02-04 2017-02-26 15 A
6 602 2017-02-14 2017-02-26 10 A 1                               
6 603 2019-11-24 2019-11-30  1 C 2 603 2019-11-24 2019-12-28  4 B
6 604 2019-11-30 2019-12-28  3 B 2                               
7 701 2019-10-20 2019-11-10 10 A 1 701 2019-10-20 2019-11-10 10 A
;
1 ACCEPTED SOLUTION

Accepted Solutions
Solph
Pyrite | Level 9

There was a typo in the reading data column, so it's fixed and a couple more records were added to ID=2.

 

Took me a while to search similar codes that work for my needs. So the code below, though not neat, works.

 

proc sort data=have; by ID start; run;

data tmp; format seq_1; set have;
	by ID;
	flag1=(start-lag(end) in (0,1));
	flag2=( (start-lag(end) in (0,1)) and (lag(start)-lag2(end) in (0,1)));
	if flag1=1 and flag2=0 then flag2=1;
	if first.ID then flag1=1;
	if first.ID then flag2=1;

	if first.ID then seq_1=1;
	if lag(flag2)>flag2 then seq_1+1;

	drop flag1 flag2;
run;
proc sort data=tmp; by ID Seq_1 start ; run;

data first last; set tmp;
	by ID Seq_1 start;
	if first.seq_1 then output first;
	if last.seq_1 then output last;
run;
proc sql; create table want as
	select a.ID, a.EPID as EPID_1, a.Start as Start_1, b.End as End_1, c.drugdays as DrugDays_1,
	b.drug as Drug_1 
	from first as a 
	left join last as b on a.id=b.id and a.seq_1=b.seq_1
	left join (select ID, Seq_1, sum(Drugdays) as Drugdays from tmp group by ID, Seq_1) as c on a.id=c.id and a.seq_1=c.seq_1;
quit;

To compare, 

data have2; set have; if start_1>.; run;
proc print data=have2 noobs; run;
proc print data=want noobs; run;

proc compare base=have2 compare=want; run;

or the final output form WANT:


ID EPID_1 Start_1 End_1 DrugDays_1 Drug_1
2 201 2019-04-09 2019-04-27 15 B
2 205 2019-04-30 2019-05-10 12 C
2 207 2019-06-15 2019-06-30 5 A
3 301 2021-07-14 2021-07-19 4 A
4 401 2017-11-02 2017-11-19 10 C
4 402 2019-07-26 2019-11-11 46 C
5 501 2018-04-22 2018-04-25 3 B
5 503 2019-08-08 2019-08-15 4 A
6 601 2017-02-04 2017-02-26 15 A
6 603 2019-11-24 2019-12-28 4 B
7 701 2019-10-20 2019-11-10 10 A

View solution in original post

4 REPLIES 4
Patrick
Opal | Level 21

Really appreciated that you provided sample data via a fully working SAS data step and that you also show us the desired result. 

 

Looking at your data I tried to figure out the logic how you would derive the value for SEQ and Drug_1 - but to no avail.

Can you please explain the logic that leads to drug_1 being C in the first two rows below? I guess that's then what leads to SEQ having a value of 2 in the second row.
Can you please also explain why the third and fourth row would be excluded?

Patrick_0-1672183227258.png

 

Solph
Pyrite | Level 9

There was a typo in the reading data column, so it's fixed and a couple more records were added to ID=2.

 

Took me a while to search similar codes that work for my needs. So the code below, though not neat, works.

 

proc sort data=have; by ID start; run;

data tmp; format seq_1; set have;
	by ID;
	flag1=(start-lag(end) in (0,1));
	flag2=( (start-lag(end) in (0,1)) and (lag(start)-lag2(end) in (0,1)));
	if flag1=1 and flag2=0 then flag2=1;
	if first.ID then flag1=1;
	if first.ID then flag2=1;

	if first.ID then seq_1=1;
	if lag(flag2)>flag2 then seq_1+1;

	drop flag1 flag2;
run;
proc sort data=tmp; by ID Seq_1 start ; run;

data first last; set tmp;
	by ID Seq_1 start;
	if first.seq_1 then output first;
	if last.seq_1 then output last;
run;
proc sql; create table want as
	select a.ID, a.EPID as EPID_1, a.Start as Start_1, b.End as End_1, c.drugdays as DrugDays_1,
	b.drug as Drug_1 
	from first as a 
	left join last as b on a.id=b.id and a.seq_1=b.seq_1
	left join (select ID, Seq_1, sum(Drugdays) as Drugdays from tmp group by ID, Seq_1) as c on a.id=c.id and a.seq_1=c.seq_1;
quit;

To compare, 

data have2; set have; if start_1>.; run;
proc print data=have2 noobs; run;
proc print data=want noobs; run;

proc compare base=have2 compare=want; run;

or the final output form WANT:


ID EPID_1 Start_1 End_1 DrugDays_1 Drug_1
2 201 2019-04-09 2019-04-27 15 B
2 205 2019-04-30 2019-05-10 12 C
2 207 2019-06-15 2019-06-30 5 A
3 301 2021-07-14 2021-07-19 4 A
4 401 2017-11-02 2017-11-19 10 C
4 402 2019-07-26 2019-11-11 46 C
5 501 2018-04-22 2018-04-25 3 B
5 503 2019-08-08 2019-08-15 4 A
6 601 2017-02-04 2017-02-26 15 A
6 603 2019-11-24 2019-12-28 4 B
7 701 2019-10-20 2019-11-10 10 A
gema
Calcite | Level 5
can you create first dataset if first.id and rest dataset else if not first.id. merge those two data sets by id if diff(any of the dates) is <= 1)

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 2124 views
  • 0 likes
  • 3 in conversation