I have a list of adverse events (AEs), and I am supposed to find if a subject has any overlapping terms and output them into a report. I have been able to do so, but on rare occasions, I have a term that is not overlapping in some places but is in others and is outputting all records. I would appreciate some help finding a way to eliminate the records that are not truly overlapping.
For example, Subject 101-002 is showing that "Urinary tract infection" is overlapping, which is true for AESPID 2 & 4; however, AESPID=1 is not overlapping. I want to get rid of this.
SUBJECTNUMBERSTR | AESPID | LLT_NAME | AESTDAT | AEENDAT |
101-002 | 1 | Urinary tract infection | 03AUG2017 | 15AUG2017 |
101-002 | 2 | Urinary tract infection | 07SEP2017 | |
101-002 | 4 | Urinary tract infection | 03OCT2017 |
My method for getting to this point is:
Here is the code I have. Please feel free to modify to help me remove AESPID=1
data AEALL;
input @1 SUBJECTNUMBERSTR $7. @9 AESPID @13 LLT_NAME $30. @45 AESTDAT date9. @57 AEENDAT date9.;
format AESTDAT AEENDAT date9.;
cards;
101-001 1 Acute respiratory failure 05AUG2017 .
101-001 2 Pleural effusion 05AUG2017 .
101-001 3 Pneumonia 07AUG2017 14AUG2017
101-001 4 Septic shock 16AUG2017 22AUG2017
101-001 5 Pneumothorax traumatic 19AUG2017 06SEP2017
101-001 6 Anemia 20AUG2017 .
101-001 7 Pneumothorax traumatic 28AUG2017 09SEP2017
101-001 8 Pneumonia 29AUG2017 08SEP2017
101-001 9 Hypotension 04SEP2017 07SEP2017
101-001 10 Hypernatremia 12AUG2017 23AUG2017
101-001 11 Thrombocytosis 15AUG2017 20AUG2017
101-001 12 Paroxysmal atrial fibrillation 13SEP2017 .
101-001 13 Respiratory infection 15SEP2017 .
101-002 1 Urinary tract infection 03AUG2017 15AUG2017
101-002 2 Urinary tract infection 07SEP2017 .
101-002 3 Pressure ulcer 11SEP2017 .
101-002 4 Urinary tract infection 03OCT2017 .
;
run;
proc sort data=aeall;
by subjectnumberstr llt_name aestdat aeendat;
where llt_name^='';
run;
data duplicate;
set aeall;
by subjectnumberstr llt_name;
if first.llt_name & last.llt_name then delete;
run;
proc sort data=duplicate;
by subjectnumberstr llt_name aespid;
run;
data overlap;
set duplicate;
by subjectnumberstr llt_name aespid;
format prev_stdat prev_endat date9.;
prev_stdat=lag(aestdat);
prev_endat=lag(aeendat);
if first.llt_name then do;
prev_stdat=.;
prev_endat=.;
end;
else do;
if (prev_endat^=. & aestdat^=. & prev_stdat<=aestdat & aestdat<=prev_endat) or
(prev_endat=. & aestdat^=. & prev_stdat<=aestdat) or
(prev_endat^=. & aestdat^=. & aestdat=prev_endat+1) or
(aestdat=.) then flag=1;
end;
run;
proc sort data=overlap out=overlap_distinct (keep=subjectnumberstr llt_name) nodupkey;
by subjectnumberstr llt_name;
where flag=1;
run;
data final;
merge aeall overlap_distinct (in=a);
by subjectnumberstr llt_name;
if a;
run;
proc sort data=final nodupkey;
by subjectnumberstr LLT_NAME AESPID AESTDAT AEENDAT;
run;
I hope I'm not simplifying this too much, to determine when there is an overlap. You might need to fiddle with the conditions.
The idea is not to delete anything until the end. Instead, assign a new variable (GROUP) that will be the same for all instances that overlap with one another.
proc sort data=AEALL;
by SubjectNumberStr llt_name aestdat aeendat;
run;
data with_groups;
set AEALL;
by SubjectNumberStr llt_name aestdat;
prev_enddate = lag(aeendat);
if first.llt_name
or (first.llt_name=0 and (prev_enddate=. or aestdat <= prev_enddate))
then group + 1;
run;
If GROUP can be accurately defined, the records are easy to subset:
data overlapping;
set with_groups;
by group;
if first.group and last.group then delete;
run;
Note that you don't have to sort by GROUP ... the data set is already sorted. But the trick is to increment GROUP properly. See how close this comes to the result you want.
I hope I'm not simplifying this too much, to determine when there is an overlap. You might need to fiddle with the conditions.
The idea is not to delete anything until the end. Instead, assign a new variable (GROUP) that will be the same for all instances that overlap with one another.
proc sort data=AEALL;
by SubjectNumberStr llt_name aestdat aeendat;
run;
data with_groups;
set AEALL;
by SubjectNumberStr llt_name aestdat;
prev_enddate = lag(aeendat);
if first.llt_name
or (first.llt_name=0 and (prev_enddate=. or aestdat <= prev_enddate))
then group + 1;
run;
If GROUP can be accurately defined, the records are easy to subset:
data overlapping;
set with_groups;
by group;
if first.group and last.group then delete;
run;
Note that you don't have to sort by GROUP ... the data set is already sorted. But the trick is to increment GROUP properly. See how close this comes to the result you want.
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.