BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
djbateman
Lapis Lazuli | Level 10

I have 2 datasets: planned hospitalizations (PHOSP) and unplanned hospitalizations (UHOSP).  A subject can be found in both tables and have multiple hospitalizations in each table.  I want to compare the two tables to see if any of the planned dates overlap with unplanned dates and retain the overlapping records.

 

In my mind, I would transpose UHOSP twice: once for the start dates and again for the end dates.  Then I would merge PHOSP with the transposed start dates and transposed end dates by SUBJID.  I could then see for each PHOSP sequence if the start/end dates overlap with any of the UHOSP start/end dates by looping through the maximum number of possible UHOSP sequences.  I feel that there is a much easier, simpler, more efficient way to do this.

 

Below is some test data.  You can see that SUBJID=102-711-401 has no overlaps between PHOSP and UHSOP, so they will not show up in the output.  But SUBJID=102-715-401 does overlap between PHSEQ=1 and UHSEQ=2.  I would like to then output these 2 records stacked on top of each other.

 

Are there any ideas out there?

 

 

 

data phosp;
	length SUBJID $11;
	input SUBJID $ PHSEQ PHSTDT PHENDT;
	informat PHSTDT PHENDT date9.;
	format PHSTDT PHENDT date9.;
	cards;
102-711-401	1	07JAN2020	17JAN2020
102-711-401	2	02JUL2020	09JUL2020
102-715-401	1	01APR2020	15APR2020
102-715-401	2	13JAN2020	16JAN2020
;
run;
data uhosp;
	length SUBJID $11;
	input SUBJID $ UHSEQ UHSTDT UHENDT;
	informat UHSTDT UHENDT date9.;
	format UHSTDT UHENDT date9.;
	cards;
102-711-401	1	27JAN2019	06MAY2019
102-711-401	2	05MAR2020	09MAR2020
102-715-401	1	31OCT2019	03NOV2019
102-715-401	2	15JAN2020	19JAN2020
102-715-401	3	04JUL2020	12JUL2020
;
run;


proc transpose data=uhosp out=uhosp_st (drop=_name_) prefix=UHST;
	id uhseq;
	by subjid;
	var uhstdt;
run;
proc transpose data=uhosp out=uhosp_en (drop=_name_) prefix=UHEN;
	id uhseq;
	by subjid;
	var uhendt;
run;

proc sql noprint;
	select max(uhseq) into :maxseq from uhosp;
	%let maxseq=%sysfunc(compress(&maxseq.));
quit;
data compare;
	merge phosp uhosp_st uhosp_en;
	by subjid;
	array start {&maxseq.} UHST1-UHST&maxseq.;
	array end {&maxseq.} UHEN1-UHEN&maxseq.;
	do i=1 to &maxseq.;
		if start(i)<=phstdt<=end(i) | start(i)<=phendt<=end(i) then flag=i;
	end;
run;

 

 

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

 

I would suggest something more along these lines:

proc sql;
   create table overlap as
   select a.subjid, a.PHSEQ, a.PHSTDT, a.PHENDT
         ,b.UHSEQ, b.UHSTDT, b.UHENDT 
   from  phosp as a
         left join
         uhosp as b
         on a.subjid=b.subjid
   where  ( a.PHSTDT le b.UHSTDT le a.PHENDT)
       or ( a.PHSTDT le b.UHENDT le a.PHENDT) 
       or ( b.UHSTDT le a.PHSTDT le b.UHENDT)
       or ( b.UHSTDT le a.PHENDT le b.UHENDT) 
   ;
quit;

Which gets ungainly if comparing many more intervals but  no macro variables needed so a bit easier to follow.

Since you are only interested in overlaps starting with a left or right join on the id means only matched ids are considered at all. The WHERE filters on the date ranges.

View solution in original post

3 REPLIES 3
ballardw
Super User

 

I would suggest something more along these lines:

proc sql;
   create table overlap as
   select a.subjid, a.PHSEQ, a.PHSTDT, a.PHENDT
         ,b.UHSEQ, b.UHSTDT, b.UHENDT 
   from  phosp as a
         left join
         uhosp as b
         on a.subjid=b.subjid
   where  ( a.PHSTDT le b.UHSTDT le a.PHENDT)
       or ( a.PHSTDT le b.UHENDT le a.PHENDT) 
       or ( b.UHSTDT le a.PHSTDT le b.UHENDT)
       or ( b.UHSTDT le a.PHENDT le b.UHENDT) 
   ;
quit;

Which gets ungainly if comparing many more intervals but  no macro variables needed so a bit easier to follow.

Since you are only interested in overlaps starting with a left or right join on the id means only matched ids are considered at all. The WHERE filters on the date ranges.

djbateman
Lapis Lazuli | Level 10
I knew SQL could come to my rescue! There is so much in SQL that I don't think of. I just rely too heavily on data step programming. I need to branch out. But this is much easier to follow and makes my code look so much cleaner. Thank you so much!
ballardw
Super User

@djbateman wrote:
I knew SQL could come to my rescue! There is so much in SQL that I don't think of. I just rely too heavily on data step programming. I need to branch out. But this is much easier to follow and makes my code look so much cleaner. Thank you so much!

One of the main things to indicate SQL might be in the solution is that you have multiples of the identification variable(s) in both sets. That means that a simple data step merge won't work. Second is that you need to compare ALL of the identification values from one with all of the other in the second set.

 

A data step approach that might work would be rename the date variables when combining the data sets with SET statement; add a source variable using either the In data set option or the INDSNAME set option; sort by id and the start date and then look for changes from source to source for the id but that would be bit trickier to confirm the overlap with use of LAG values.

If you haven't use the INDSNAME option a pseudo example:

 

Data new;
    set data1 data2 indsname=somevar;
   length source $ 41;
   source = somevar;
run;

I use a length of 41 because the SAS temporary variable created by the indsname would have the library.datasetname. Which has 8 characters for the library, 1 for the . and 32 for the data set name: 8+1+32 = 41. You may not need all of them when you copy the code from another project your data set names don't get truncated.

Data set option approach:

data new;
   set data1 (in=in1)
        data2 (in=in2)
   ;
   if in1 then source='data1';
   else if in2 then source='data2';
run;

Or create source as numeric an any of a number of ways. Slick but not obvious is something like:

source = in1*1 + in2*2;

The in variables are boolean 1/0 when the current record comes from the indicated data set. The main advantage over this approach is you can easily add source values as long as you get the correct in variable and the order value. I know I had least one project where I wanted this indicator to be numeric but don't remember the exact reason why at the moment.

Note IN variables and the INDSNAME variables are all temporary and will be dropped by SAS. If you need to keep the information you are responsible for assigning it to a new variable to keep in the data.

 

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 1569 views
  • 0 likes
  • 2 in conversation