Solved: Re: Defining at risk population in longitudinal data set

cchubbard1963 · Posted 10-28-2021 03:42 PM

I have a data set consisting of a longitudinal test results on a cohort of patients. The data set consists of the following variables: patient ID ("ID), test results ("VALUE"), and test date ("DATE"). A fake sample data set with this structure is attached. The goal of the research is to determine if there is a change in testing frequency over various time frames. In order to develop a denominator of patients "at risk" in the definition of the testing rate, the researchers have developed the following criteria to determine if a patient should contribute to the denominator:

(1) if a patient has two tests within 24 weeks, then they are considered "at risk" beginning on the date of the second test and contribute to the denominator.

(2) If an at risk patient has a gap of 12 weeks without a test, then after 12 weeks they are no longer considered at risk and do not contribute to the denominator thereafter, except ...

(3) Patients dropped from the at risk population because of criteria No. 2 can reenter the cohort, and be considered at risk at later time periods, if they meet criteria No. 1 in the subsequent time period.

So, patients can enter and leave the at risk population.

For example, a patient has tests on Jan. 1, 2019, Jan 15, 2019, June 1, 2019 and July 1, 2019. Based upon these criteria, this patient is considered at risk starting on Jan. 15, 2019 (second test within 24 weeks), and remains at risk until Apr 9, 2019 (12 weeks after 1/15/19), at which time they are no longer at risk, until July 1, 2019 (second test within 24 weeks), at which time they are again at risk.

I need to create a data set with a row for each patient for each month in the study period, that would look something like this:

ID	Month	At Risk
1256	Jan 2019	1
1256	Feb 2019	1
1256	Mar 2019	1
1256	Apr 2019	1
1256	May 2019	0

(Patients are at risk for the month if they are at risk on any day of that month.)

If criteria Nos. 1 and 2 had the same time period (12 weeks or 24 weeks), I think I know how to program this. However, the differing time periods have thwarted my efforts thus far.

Any insights or suggestions would be greatly appreciated.

Thanks in advance!

FreelanceReinh · Posted 10-30-2021 08:48 AM

Hello @cchubbard1963,

@cchubbard1963 wrote:

For example, a patient has tests on Jan. 1, 2019, Jan 15, 2019, June 1, 2019 and July 1, 2019. Based upon these criteria, this patient is considered at risk starting on Jan. 15, 2019 (second test within 24 weeks), and remains at risk until Apr 9, 2019 (12 weeks after 1/15/19), at which time they are no longer at risk, until July 1, 2019 (second test within 24 weeks), at which time they are again at risk.

I take it that criterion (1) is not applied to the two tests on Jan 15, 2019 and June 1, 2019 (which are less than 24 weeks apart) because after the 12-week gap following Jan 15, 2019 the test on June 1, 2019 is regarded as the first test of a new series.

One way to tackle this type of problem is what Paul Dorfman calls the "paintbrush approach." For each patient the days at risk are marked in a sufficiently large array (like a "calendar"). In my suggestion below I set start and end dates covering the test dates of all patients in your HAVE dataset plus 12 weeks follow-up. (You could also make this dynamic, i.e., derive start and end from dataset HAVE.) I'm not sure if "the study period" you refer to is common for all patients or if each patient has an individual study period. In the latter case you would need to restrict the WANT dataset to those individual study periods. This is done in the "optional" part of the code shown below (your definition of the individual study periods may be different, though).

/* Define a common study period for all patients (incl. follow-up) */

%let start = %sysevalf('01JAN2018'd);
%let end   = %sysevalf('31DEC2020'd);

/* Create at-risk flag ATRISK in (0, 1) on patient-date level */

data _tmp(keep=id date atrisk) / view=_tmp;
array a[&start:&end] _temporary_;
do t=1 by 0 until(last.id);
  set have;
  by id date;
  /* if last.date then do; */  /* use this to treat multiple tests on the same date as one test */
    if .<r<date then do; /* note the final incrementation of r in the iterative DO loop below */
      t=1; /* reset "test counter" t after a 12-week gap */
      r=.;
    end;
    if t>=2 & intck('week',prevdate,date,'c')<24 then
      do r=coalesce(r,date) to intnx('week',date,12,'s')-1;
        a[r]=1;
      end;
    prevdate=date;
    t+1;
  /* end; */                   /* use this to treat multiple tests on the same date as one test */
end;
do date=&start to &end;
  atrisk=(a[date]=1);
  output;
end;
call missing(of a[*]);
run;

/* Aggregate at-risk flags on patient-month level */

proc summary data=_tmp;
by id date;
format date monyy7.;
var atrisk;
output out=want(drop=_:) max=;
run;

/* Optional: Restrict WANT dataset to individual study periods */

proc sql;
create table periods as 
select id, intnx('month',min(date),0,'b') as _start,
           intnx('month',intnx('week',max(date),12,'s')-1,0,'e') as _end
from have
group by id
order by id;
quit;

data final(drop=_:);
merge want periods;
by id;
if _start<=date<=_end;
run;

For validation purposes you may want to create a physical dataset instead of the view _TMP. This dataset might be large, as the number of observations equals the number of patients times the number of days in the overall study period (&end-&start+1), e.g., 48*1096=52608 for your sample data. For example, you may want to check if borderline cases (such as a test exactly 12 or 24 weeks after the previous one) are handled as required. With the above implementation a 12-week gap would mean at least one day with ATRISK=0 before the next test ~~a new phase with ATRISK=1 begins~~ (if any). Duplicate dates for a patient (there is one such case in your sample data: id=6149, date='21APR2019'd) are treated as separate tests. In particular, the patient will be considered "at risk" immediately on that date. If this is not desired (i.e., multiple tests on the same day should be treated as one test), just activate the IF-THEN-DO-END statements that I have commented out.

I haven't used variable VALUE in my code. So if there is a criterion regarding this variable (e.g., value>. for a valid test), this will need to be added. Similarly, I haven't considered missing dates (as these don't occur in your sample data and could be filtered easily).

View solution in original post

FreelanceReinh · Posted 10-30-2021 08:48 AM

Hello @cchubbard1963,

@cchubbard1963 wrote:

For example, a patient has tests on Jan. 1, 2019, Jan 15, 2019, June 1, 2019 and July 1, 2019. Based upon these criteria, this patient is considered at risk starting on Jan. 15, 2019 (second test within 24 weeks), and remains at risk until Apr 9, 2019 (12 weeks after 1/15/19), at which time they are no longer at risk, until July 1, 2019 (second test within 24 weeks), at which time they are again at risk.

I take it that criterion (1) is not applied to the two tests on Jan 15, 2019 and June 1, 2019 (which are less than 24 weeks apart) because after the 12-week gap following Jan 15, 2019 the test on June 1, 2019 is regarded as the first test of a new series.

One way to tackle this type of problem is what Paul Dorfman calls the "paintbrush approach." For each patient the days at risk are marked in a sufficiently large array (like a "calendar"). In my suggestion below I set start and end dates covering the test dates of all patients in your HAVE dataset plus 12 weeks follow-up. (You could also make this dynamic, i.e., derive start and end from dataset HAVE.) I'm not sure if "the study period" you refer to is common for all patients or if each patient has an individual study period. In the latter case you would need to restrict the WANT dataset to those individual study periods. This is done in the "optional" part of the code shown below (your definition of the individual study periods may be different, though).

/* Define a common study period for all patients (incl. follow-up) */

%let start = %sysevalf('01JAN2018'd);
%let end   = %sysevalf('31DEC2020'd);

/* Create at-risk flag ATRISK in (0, 1) on patient-date level */

data _tmp(keep=id date atrisk) / view=_tmp;
array a[&start:&end] _temporary_;
do t=1 by 0 until(last.id);
  set have;
  by id date;
  /* if last.date then do; */  /* use this to treat multiple tests on the same date as one test */
    if .<r<date then do; /* note the final incrementation of r in the iterative DO loop below */
      t=1; /* reset "test counter" t after a 12-week gap */
      r=.;
    end;
    if t>=2 & intck('week',prevdate,date,'c')<24 then
      do r=coalesce(r,date) to intnx('week',date,12,'s')-1;
        a[r]=1;
      end;
    prevdate=date;
    t+1;
  /* end; */                   /* use this to treat multiple tests on the same date as one test */
end;
do date=&start to &end;
  atrisk=(a[date]=1);
  output;
end;
call missing(of a[*]);
run;

/* Aggregate at-risk flags on patient-month level */

proc summary data=_tmp;
by id date;
format date monyy7.;
var atrisk;
output out=want(drop=_:) max=;
run;

/* Optional: Restrict WANT dataset to individual study periods */

proc sql;
create table periods as 
select id, intnx('month',min(date),0,'b') as _start,
           intnx('month',intnx('week',max(date),12,'s')-1,0,'e') as _end
from have
group by id
order by id;
quit;

data final(drop=_:);
merge want periods;
by id;
if _start<=date<=_end;
run;

For validation purposes you may want to create a physical dataset instead of the view _TMP. This dataset might be large, as the number of observations equals the number of patients times the number of days in the overall study period (&end-&start+1), e.g., 48*1096=52608 for your sample data. For example, you may want to check if borderline cases (such as a test exactly 12 or 24 weeks after the previous one) are handled as required. With the above implementation a 12-week gap would mean at least one day with ATRISK=0 before the next test ~~a new phase with ATRISK=1 begins~~ (if any). Duplicate dates for a patient (there is one such case in your sample data: id=6149, date='21APR2019'd) are treated as separate tests. In particular, the patient will be considered "at risk" immediately on that date. If this is not desired (i.e., multiple tests on the same day should be treated as one test), just activate the IF-THEN-DO-END statements that I have commented out.

I haven't used variable VALUE in my code. So if there is a criterion regarding this variable (e.g., value>. for a valid test), this will need to be added. Similarly, I haven't considered missing dates (as these don't occur in your sample data and could be filtered easily).

cchubbard1963 · Posted 11-03-2021 03:09 PM

Thank you so much @FreelanceReinhard! I think that is exactly what I'm looking for and appears to work perfectly.

Defining at risk population in longitudinal data set

Re: Defining at risk population in longitudinal data set

Re: Defining at risk population in longitudinal data set

Re: Defining at risk population in longitudinal data set

Click image to register for webinar

Classroom Training Available!