BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
PharmlyDoc
Quartz | Level 8

I am trying to flag patients with nephropathy, which is defined as having >= two measures of eGFR less than 60, separated by at least 90 days without any intervening values >= 60. 

 

I am able to flag cases when two eGFRs are  <60 and occurring 90 or more days apart, but the tricky part, which I can't figure out, is how to exclude cases with an intervening eGFR >= 60.

 

I want a list of patients who meet the criteria, which are PAT_ID: 904901, 904902, and 904903. 

Patient 904901 has nephropathy because eGFR <60 on 31MAR2020 and eGFR < 60 on 31AUG2020.

Patient 904902 has nephropathy because eGFR <60 on 31MAR2020 and eGFR < 60 on 31JUL2020.

Patient 904903 has nephropathy because eGFR <60 on 20FEB2021 and eGFR < 60 on 25JUN2021.

 

Note that patient 904906 does not meet the criteria for nephropathy because although the eGFRs on 31MAR2020 and 31AUG2020 are <60, they have an intervening eGFR >=60 between those dates. 

 

 

data DATES_EGFR;
infile datalines delimiter=',';
input PAT_ID $ eGFR 4.1 RESULT_TIME ANYDTDTM19. ;
format eGFR 4.1;
format RESULT_TIME DATETIME19.;
datalines;
904901,59.1,31Mar2020:15:30:00
904901,56.0,31AUG2020:10:00:00
904901,65.0,14OCT2020:09:30:00
904901,66.6,10JAN2021:13:45:00
904902,59.1,31Mar2020:15:30:00
904902,56.0,31JUL2020:10:00:00
904902,65.0,14OCT2020:09:30:00
904902,66.6,10NOV2020:13:45:00
904903,59.1,31Mar2020:15:30:00
904903,65.0,14APR2020:09:30:00
904903,56.0,31JUL2020:10:00:00
904903,66.6,10NOV2020:13:45:00
904903,57.0,20FEB2021:13:45:00
904903,56.6,25JUN2021:13:45:00
904904,66.6,10NOV2020:13:45:00
904904,70.0,20FEB2021:13:45:00
904904,64.7,25JUN2021:13:45:00
904905,80.0,25JUN2021:13:45:00
904906,50.5,31MAR2020:15:30:00
904906,62.5,26APR2020:15:30:00
904906,56.3,31AUG2020:10:00:00
;
run;


proc sql;
select
egfr.*
,case when exists
(select 1
from work.DATES_EGFR egfr2
where egfr.PAT_ID=egfr2.PAT_ID
and datdif(datepart(egfr2.RESULT_TIME), datepart(egfr.RESULT_TIME), 'ACT/ACT') >= 90
and egfr2.eGFR < 60
and egfr.eGFR < 60
) THEN 1 ELSE 0 END as NEPHROPATHY 
from work.DATES_EGFR egfr;
quit;

 

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

Proc sql, and SQL in general are not really well suited to anything that requires specific "order". It may be possible but generally the code is long, ugly, convoluted and fragile, meaning a very minor change to the rules can completely break what was working.

 

I may be missing some nuances but see if this gets you started.

 

proc sort data=DATES_EGFR;
   by PAT_ID RESULT_TIME;
run;

data want;
   set DATES_EGFR;
   by pat_id;
   retain eGFR_60  date_egfr_60;
   if first.pat_id then call missing(eGFR_60,  date_egfr_60);
   NEPHROPATHY=0;
   if eGFR< 60 then do;
      if missing(egfr_60) then do;
         /* first < 60 in sequence*/
         egfr_60 =  eGFR;
         date_egfr_60 = RESULT_TIME;
      end;
      else if intck('dtday',date_egfr_60,RESULT_TIME) ge 90 then do;
         /* interval over 90 days*/
         NEPHROPATHY =1;
      end;
      else if intck('dtday',date_egfr_60,RESULT_TIME) < 90 then do;
         /* interval too short reset the comparison variables
            don't set flag
         */
      end;
   end;
   if eGFR ge 60 then do;
      /* reset the comparison variables
         to missing as no longer valid interval*/
      call missing(eGFR_60,  date_egfr_60) ;
   end;
   label
      egfr_60 ='eGFR value at start of series of < 60 values'
      date_egfr_60 ='Time of start of series < 60'
   ;
   format  date_egfr_60 datetime19.;
run;

I think you may have seen Retain before to keep values across data step boundaries and use of the First. variables.

In this case the Retained values are the start information for a series of records where the eGFR < 60.

If you aren't familiar with Call missing, the function will set a list of variables to missing. So the first use resets between Pat_id record groups. The second use resets when you have a value > 60 that breaks a series of values.

The Intck function returns the number of date, time or datetime interval boundaries occur between two values.

 

If this were my data with the range of values you show for the Result time I would ignore the time component and use date values instead. That would mean use of the Date9. informat to read the date time value which would ignore the time portion. Then use Intck('day',... ) instead of 'dtday', which is the instruction to create day intervals from datetime values.

 

 

 

View solution in original post

3 REPLIES 3
ballardw
Super User

Proc sql, and SQL in general are not really well suited to anything that requires specific "order". It may be possible but generally the code is long, ugly, convoluted and fragile, meaning a very minor change to the rules can completely break what was working.

 

I may be missing some nuances but see if this gets you started.

 

proc sort data=DATES_EGFR;
   by PAT_ID RESULT_TIME;
run;

data want;
   set DATES_EGFR;
   by pat_id;
   retain eGFR_60  date_egfr_60;
   if first.pat_id then call missing(eGFR_60,  date_egfr_60);
   NEPHROPATHY=0;
   if eGFR< 60 then do;
      if missing(egfr_60) then do;
         /* first < 60 in sequence*/
         egfr_60 =  eGFR;
         date_egfr_60 = RESULT_TIME;
      end;
      else if intck('dtday',date_egfr_60,RESULT_TIME) ge 90 then do;
         /* interval over 90 days*/
         NEPHROPATHY =1;
      end;
      else if intck('dtday',date_egfr_60,RESULT_TIME) < 90 then do;
         /* interval too short reset the comparison variables
            don't set flag
         */
      end;
   end;
   if eGFR ge 60 then do;
      /* reset the comparison variables
         to missing as no longer valid interval*/
      call missing(eGFR_60,  date_egfr_60) ;
   end;
   label
      egfr_60 ='eGFR value at start of series of < 60 values'
      date_egfr_60 ='Time of start of series < 60'
   ;
   format  date_egfr_60 datetime19.;
run;

I think you may have seen Retain before to keep values across data step boundaries and use of the First. variables.

In this case the Retained values are the start information for a series of records where the eGFR < 60.

If you aren't familiar with Call missing, the function will set a list of variables to missing. So the first use resets between Pat_id record groups. The second use resets when you have a value > 60 that breaks a series of values.

The Intck function returns the number of date, time or datetime interval boundaries occur between two values.

 

If this were my data with the range of values you show for the Result time I would ignore the time component and use date values instead. That would mean use of the Date9. informat to read the date time value which would ignore the time portion. Then use Intck('day',... ) instead of 'dtday', which is the instruction to create day intervals from datetime values.

 

 

 

PharmlyDoc
Quartz | Level 8

@ballardw You submitted an answer when I was commenting in an update to the data. 

 

Updated data: 

data DATES_EGFR;
infile datalines delimiter=',';
input PAT_ID $ eGFR 4.1 RESULT_TIME ANYDTDTM19. ;
format eGFR 4.1;
format RESULT_TIME DATETIME19.;
datalines;
904901,59.1,31Mar2020:15:30:00
904901,56.0,31AUG2020:10:00:00
904901,65.0,14OCT2020:09:30:00
904901,66.6,10JAN2021:13:45:00
904902,59.1,31Mar2020:15:30:00
904902,56.0,31JUL2020:10:00:00
904902,65.0,14OCT2020:09:30:00
904902,66.6,10NOV2020:13:45:00
904903,59.1,31Mar2020:15:30:00
904903,65.0,14APR2020:09:30:00
904903,56.0,31JUL2020:10:00:00
904903,66.6,10NOV2020:13:45:00
904903,57.0,20FEB2021:13:45:00
904903,56.6,25JUN2021:13:45:00
904904,66.6,10NOV2020:13:45:00
904904,70.0,20FEB2021:13:45:00
904904,64.7,25JUN2021:13:45:00
904905,80.0,25JUN2021:13:45:00
904906,50.5,31MAR2020:15:30:00
904906,62.5,26APR2020:15:30:00
904906,56.3,31AUG2020:10:00:00
904907,50.5,31MAR2020:15:30:00
904907,51.0,26APR2020:15:30:00
904907,62.0,31AUG2020:10:00:00
904907,61.0,15SEP2020:10:00:00
904907,55.0,24NOV2020:10:00:00
;
run;

I'll give your code a go. 

Here was one solution that I was working on (but it's proc sql, and doesn't work for excluding patient 904907):

proc sort data=dates_egfr;
by PAT_ID RESULT_TIME;
run;

data DATES_EGFR_rownum;
set DATES_EGFR;
by PAT_ID RESULT_TIME;
if first.PAT_ID then LINE=1;
else LINE+1;
run;

proc sql;
select
distinct PAT_ID
from DATES_EGFR_rownum egfr
where PAT_ID in (
select egfr2.PAT_ID
from DATES_EGFR_rownum egfr2
where egfr.PAT_ID=egfr2.PAT_ID
and datdif(datepart(egfr.RESULT_TIME), datepart(egfr2.RESULT_TIME), 'ACT/ACT') >= 90
and egfr.LINE < egfr2.LINE and egfr.eGFR<60 and egfr2.eGFR <60)
and PAT_ID not in
(select egfr3.PAT_ID
from DATES_EGFR_rownum egfr3
where egfr.PAT_ID=egfr3.PAT_ID
and datdif(datepart(egfr3.RESULT_TIME), datepart(egfr3.RESULT_TIME), 'ACT/ACT') < 90
and egfr.LINE < egfr3.LINE < egfr.LINE+2  and egfr3.eGFR>=60 )
;
quit;

 

SASKiwi
PROC Star

Here's my attempt. I agree with @ballardw that a DATA step version is easier:

data Want;
  set DATES_EGFR;
  by PAT_ID;
  retain eGFR_Event1 eGFR_Event2;
  format Result_Date eGFR_Event1 eGFR_Event2 date9.;
  if first.PAT_ID then do;
    eGFR_Event1 = .;
    eGFR_Event2 = .;
  end;
  Result_Date = datepart(RESULT_TIME);
  if eGFR < 60 and eGFR_Event1 = . then eGFR_Event1 = Result_Date;
  if eGFR >= 60 and eGFR_Event1 ne . and Result_Date > eGFR_Event1 then eGFR_Event1 = .;
  if eGFR < 60 and eGFR_Event1 ne . and Result_Date > eGFR_Event2 then eGFR_Event2 = Result_Date;
  if not first.PAT_ID and eGFR_Event2 - eGFR_Event1 >=90 then NEPHROPATHY = 1;
  else NEPHROPATHY = 0;
run;

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 341 views
  • 0 likes
  • 3 in conversation