BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
cchubbard1963
Calcite | Level 5

I have a data set consisting of a longitudinal test results on a cohort of patients.  The data set consists of the following variables: patient ID ("ID), test results ("VALUE"), and test date ("DATE").   A fake sample data set with this structure is attached.  The goal of the research is to determine if there is a change in testing frequency over various time frames.  In order to develop a denominator of patients "at risk" in the definition of the testing rate, the researchers have developed the following criteria to determine if a patient should contribute to the denominator: 

 

(1) if a patient has two tests within 24 weeks, then they are considered "at risk" beginning on the date of the second test and contribute to the denominator. 

(2) If an at risk patient has a gap of 12 weeks without a test, then after 12 weeks they are no longer considered at risk and do not contribute to the denominator thereafter, except ... 

(3) Patients dropped from the at risk population because of criteria No. 2 can reenter the cohort, and be considered at risk at later time periods, if they meet criteria No. 1 in the subsequent time period.

 

So, patients can enter and leave the at risk population. 

 

For example, a patient has tests on Jan. 1, 2019, Jan 15, 2019, June 1, 2019 and July 1, 2019.  Based upon these criteria, this patient is considered at risk starting on Jan. 15, 2019 (second test within 24 weeks), and remains at risk until Apr 9, 2019 (12 weeks after 1/15/19), at which time they are no longer at risk, until July 1, 2019 (second test within 24 weeks), at which time they are again at risk.

 

I need to create a data set with a row for each patient for each month in the study period, that would look something like this:

 

IDMonthAt Risk
1256Jan 20191
1256Feb 20191
1256Mar 2019

1

1256Apr 2019

1

1256May 2019

0

 

(Patients are at risk for the month if they are at risk on any day of that month.)

 

If criteria Nos. 1 and 2 had the same time period (12 weeks or 24 weeks), I think I know how to program this.  However, the differing time periods have thwarted my efforts thus far.  

 

Any insights or suggestions would be greatly appreciated.

 

Thanks in advance!

 

1 ACCEPTED SOLUTION

Accepted Solutions
FreelanceReinh
Jade | Level 19

Hello @cchubbard1963,

 


@cchubbard1963 wrote:

For example, a patient has tests on Jan. 1, 2019, Jan 15, 2019, June 1, 2019 and July 1, 2019.  Based upon these criteria, this patient is considered at risk starting on Jan. 15, 2019 (second test within 24 weeks), and remains at risk until Apr 9, 2019 (12 weeks after 1/15/19), at which time they are no longer at risk, until July 1, 2019 (second test within 24 weeks), at which time they are again at risk.


I take it that criterion (1) is not applied to the two tests on Jan 15, 2019 and June 1, 2019 (which are less than 24 weeks apart) because after the 12-week gap following Jan 15, 2019 the test on June 1, 2019 is regarded as the first test of a new series.

 

One way to tackle this type of problem is what Paul Dorfman calls the "paintbrush approach." For each patient the days at risk are marked in a sufficiently large array (like a "calendar"). In my suggestion below I set start and end dates covering the test dates of all patients in your HAVE dataset plus 12 weeks follow-up. (You could also make this dynamic, i.e., derive start and end from dataset HAVE.) I'm not sure if "the study period" you refer to is common for all patients or if each patient has an individual study period. In the latter case you would need to restrict the WANT dataset to those individual study periods. This is done in the "optional" part of the code shown below (your definition of the individual study periods may be different, though).

 

/* Define a common study period for all patients (incl. follow-up) */

%let start = %sysevalf('01JAN2018'd);
%let end   = %sysevalf('31DEC2020'd);

/* Create at-risk flag ATRISK in (0, 1) on patient-date level */

data _tmp(keep=id date atrisk) / view=_tmp;
array a[&start:&end] _temporary_;
do t=1 by 0 until(last.id);
  set have;
  by id date;
  /* if last.date then do; */  /* use this to treat multiple tests on the same date as one test */
    if .<r<date then do; /* note the final incrementation of r in the iterative DO loop below */
      t=1; /* reset "test counter" t after a 12-week gap */
      r=.;
    end;
    if t>=2 & intck('week',prevdate,date,'c')<24 then
      do r=coalesce(r,date) to intnx('week',date,12,'s')-1;
        a[r]=1;
      end;
    prevdate=date;
    t+1;
  /* end; */                   /* use this to treat multiple tests on the same date as one test */
end;
do date=&start to &end;
  atrisk=(a[date]=1);
  output;
end;
call missing(of a[*]);
run;

/* Aggregate at-risk flags on patient-month level */

proc summary data=_tmp;
by id date;
format date monyy7.;
var atrisk;
output out=want(drop=_:) max=;
run;

/* Optional: Restrict WANT dataset to individual study periods */

proc sql;
create table periods as 
select id, intnx('month',min(date),0,'b') as _start,
           intnx('month',intnx('week',max(date),12,'s')-1,0,'e') as _end
from have
group by id
order by id;
quit;

data final(drop=_:);
merge want periods;
by id;
if _start<=date<=_end;
run;

For validation purposes you may want to create a physical dataset instead of the view _TMP. This dataset might be large, as the number of observations equals the number of patients times the number of days in the overall study period (&end-&start+1), e.g., 48*1096=52608 for your sample data. For example, you may want to check if borderline cases (such as a test exactly 12 or 24 weeks after the previous one) are handled as required. With the above implementation a 12-week gap would mean at least one day with ATRISK=0 before the next test a new phase with ATRISK=1 begins (if any). Duplicate dates for a patient (there is one such case in your sample data: id=6149, date='21APR2019'd) are treated as separate tests. In particular, the patient will be considered "at risk" immediately on that date. If this is not desired (i.e., multiple tests on the same day should be treated as one test), just activate the IF-THEN-DO-END statements that I have commented out.

 

I haven't used variable VALUE in my code. So if there is a criterion regarding this variable (e.g., value>. for a valid test), this will need to be added. Similarly, I haven't considered missing dates (as these don't occur in your sample data and could be filtered easily).

View solution in original post

2 REPLIES 2
FreelanceReinh
Jade | Level 19

Hello @cchubbard1963,

 


@cchubbard1963 wrote:

For example, a patient has tests on Jan. 1, 2019, Jan 15, 2019, June 1, 2019 and July 1, 2019.  Based upon these criteria, this patient is considered at risk starting on Jan. 15, 2019 (second test within 24 weeks), and remains at risk until Apr 9, 2019 (12 weeks after 1/15/19), at which time they are no longer at risk, until July 1, 2019 (second test within 24 weeks), at which time they are again at risk.


I take it that criterion (1) is not applied to the two tests on Jan 15, 2019 and June 1, 2019 (which are less than 24 weeks apart) because after the 12-week gap following Jan 15, 2019 the test on June 1, 2019 is regarded as the first test of a new series.

 

One way to tackle this type of problem is what Paul Dorfman calls the "paintbrush approach." For each patient the days at risk are marked in a sufficiently large array (like a "calendar"). In my suggestion below I set start and end dates covering the test dates of all patients in your HAVE dataset plus 12 weeks follow-up. (You could also make this dynamic, i.e., derive start and end from dataset HAVE.) I'm not sure if "the study period" you refer to is common for all patients or if each patient has an individual study period. In the latter case you would need to restrict the WANT dataset to those individual study periods. This is done in the "optional" part of the code shown below (your definition of the individual study periods may be different, though).

 

/* Define a common study period for all patients (incl. follow-up) */

%let start = %sysevalf('01JAN2018'd);
%let end   = %sysevalf('31DEC2020'd);

/* Create at-risk flag ATRISK in (0, 1) on patient-date level */

data _tmp(keep=id date atrisk) / view=_tmp;
array a[&start:&end] _temporary_;
do t=1 by 0 until(last.id);
  set have;
  by id date;
  /* if last.date then do; */  /* use this to treat multiple tests on the same date as one test */
    if .<r<date then do; /* note the final incrementation of r in the iterative DO loop below */
      t=1; /* reset "test counter" t after a 12-week gap */
      r=.;
    end;
    if t>=2 & intck('week',prevdate,date,'c')<24 then
      do r=coalesce(r,date) to intnx('week',date,12,'s')-1;
        a[r]=1;
      end;
    prevdate=date;
    t+1;
  /* end; */                   /* use this to treat multiple tests on the same date as one test */
end;
do date=&start to &end;
  atrisk=(a[date]=1);
  output;
end;
call missing(of a[*]);
run;

/* Aggregate at-risk flags on patient-month level */

proc summary data=_tmp;
by id date;
format date monyy7.;
var atrisk;
output out=want(drop=_:) max=;
run;

/* Optional: Restrict WANT dataset to individual study periods */

proc sql;
create table periods as 
select id, intnx('month',min(date),0,'b') as _start,
           intnx('month',intnx('week',max(date),12,'s')-1,0,'e') as _end
from have
group by id
order by id;
quit;

data final(drop=_:);
merge want periods;
by id;
if _start<=date<=_end;
run;

For validation purposes you may want to create a physical dataset instead of the view _TMP. This dataset might be large, as the number of observations equals the number of patients times the number of days in the overall study period (&end-&start+1), e.g., 48*1096=52608 for your sample data. For example, you may want to check if borderline cases (such as a test exactly 12 or 24 weeks after the previous one) are handled as required. With the above implementation a 12-week gap would mean at least one day with ATRISK=0 before the next test a new phase with ATRISK=1 begins (if any). Duplicate dates for a patient (there is one such case in your sample data: id=6149, date='21APR2019'd) are treated as separate tests. In particular, the patient will be considered "at risk" immediately on that date. If this is not desired (i.e., multiple tests on the same day should be treated as one test), just activate the IF-THEN-DO-END statements that I have commented out.

 

I haven't used variable VALUE in my code. So if there is a criterion regarding this variable (e.g., value>. for a valid test), this will need to be added. Similarly, I haven't considered missing dates (as these don't occur in your sample data and could be filtered easily).

cchubbard1963
Calcite | Level 5
Thank you so much @FreelanceReinhard! I think that is exactly what I'm looking for and appears to work perfectly.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 2 replies
  • 632 views
  • 1 like
  • 2 in conversation