I'm currently working on a project where I need to identify and process records of deterioration events within a clinical dataset. My objective is to first identify initial deterioration events and then track subsequent confirmations of deterioration at least 14 days later. Additionally, I need to ensure that there is a prior evaluable record within 48 days of the identified deterioration. Here is the specs
Date of first deterioration: earliest ADC30.ADT where ADC30.CHGCAT1="Deterioration" and ADC30.ADT >= ADSL.RANDDT (i.e., date of record 1 in column 'Derivation') | ADT-STARTDT+1 | 0 | "Deterioration" | Clinically meaningful deterioration (compared to PRO baseline) in scale score[a] not occurring after 2 or more missed PRO assessments (i.e., 48 days) | Time to confirmed deterioration will be defined as the time from randomization until the date of the first clinically meaningful deterioration[b] that is confirmed at a subsequent assessment (at least 14 days apart), regardless of whether the patient withdraws from study treatment or receives another anti-cancer therapy prior to deterioration (i.e., date of deterioration event or censoring – date of randomization + 1). Patients with a single deterioration and no further assessments will be treated as deteriorated in the analysis. |
4) Else if there is a record with ADC30.CHGCAT1="Deterioration" and ADC30.ADT >= ADSL.RANDDT (record 1) and if the subsequent record (earliest record that is at least 14 days after record 1) has ADC30.CHGCAT1="Deterioration" and ADC30.ADT >= ADSL.RANDDT (record 2). In addition, record 1 must have a prior evaluable record within 48 days (i.e., not missing(ADC30.AVAL) and ADC30.ADT <= date of record 1 <= ADC30.ADT+48). 5) Else if there is a record with ADC30.CHGCAT1="Deterioration" and ADC30.ADT >= ADSL.RANDDT (record 1) and there are no further assessments. In addition, record 1 must have a prior evaluable record within 48 days (i.e., not missing(ADC30.AVAL) and ADC30.ADT <= date of record 1 <= ADC30.ADT+48). |
max[ADSL.RANDDT, Date of last PRO assessment prior to record 1 where the score could be evaluated: ADC30.ADT where not missing(ADC30.AVAL)] | ADT-STARTDT+1 | 1 | "Deterioration after 2 or more missed assessments" | Clinically meaningful deterioration (compared to PRO baseline) in scale score [a] after 2 or more missed PRO assessments (i.e., 48 days) | Also, in case of first confirmed deterioration after 2 or more consecutive missed PRO assessment visits (i.e., date from last PRO assessment prior to 2 or more consecutive missed PRO assessment visits and date of first confirmed deterioration ≥ 48 days) the patient will be censored at the time of the last PRO assessment where the score could be evaluated prior to 2 or more consecutive missed visits. | 6) Else if there is a record with ADC30.CHGCAT1="Deterioration" and ADC30.ADT >= ADSL.RANDDT (record 1) and if the subsequent record (earliest record that is at least 14 days after record 1) has ADC30.CHGCAT1="Deterioration" and ADC30.ADT >= ADSL.RANDDT (record 2). 7) Else if there is a record with ADC30.CHGCAT1="Deterioration" and ADC30.ADT >= ADSL.RANDDT (record 1) and there are no further assessments. |
Here is my code deterfl and detefl2 are supposed to represent records for each condition. I think I've overcomplicated can anyone suggest how to streamline
data det_initial;
set ADC30_M;
if chgcat1 = "Deterioration" and adt >= randdt;
run;
proc sort data=ADC30_M;
by usubjid paramcd ADT;
run;
/* Identify records for scenarios 4 and 5 */
data det_confirmed;
if _N_ = 1 then do;
declare hash h (dataset: 'det_initial');
h.defineKey('usubjid', 'paramcd', 'adt');
h.defineDone();
end;
set ADC30_M;
by usubjid paramcd adt;
retain first_deterdt confirmed_flag;
format first_deterdt date9.;
/* Initialise the flag and date for each new group of usubjid and paramcd */
if first.usubjid or first.paramcd then do;
first_deterdt = .;
confirmed_flag = 0;
end;
/* Identify the initial deterioration */
rc = h.find();
if rc = 0 and confirmed_flag = 0 and adt >= randdt then do;
first_deterdt = adt;
confirmed_flag = 1;
end;
/* Check for a subsequent record at least 14 days later with 'Deterioration' */
if confirmed_flag = 1 and adt >= first_deterdt + 14 then do;
if adt > first_deterdt then do;
if chgcat1 = "Deterioration" then do;
output;
confirmed_flag = 0;
end;
else do;
confirmed_flag = 0; /* Reset flag if no confirmation */
end;
end;
end;
/* If last record in group, check if no further assessments and handle single deterioration */
if last.paramcd then do;
if confirmed_flag = 1 then do;
DETDESC = 'Deterioration'; /* Set event description to 'Deterioration' */
output;
end;
first_deterdt = .;
confirmed_flag = 0;
end;
run;
/*Ensure Prior Evaluable Record Within 48 Days */
proc sql;
create table det_prior as
select
a.usubjid,
a.paramcd,
a.adt as deterdt,
a.ady,
a.avisit,
a.DETDESC,
a.avisitn,
b.adt as prior_adt,
b.aval as prior_aval,
a.adt - b.adt as day_count,
case
when b.adt is null then 1 /* Censor if there is no prior evaluable record */
else 0
end as CNSR
from
det_confirmed as a
left join
ADC30_M as b
on
a.usubjid = b.usubjid and
a.paramcd = b.paramcd and
not missing(b.aval) and
b.adt <= a.adt and
b.adt >= a.adt - 48;
quit;
proc sort data=det_prior;
by usubjid avisitn paramcd deterdt;
run;
data deterioration1;
set det_prior;
length deterfl $1;
if day_count ne 0 then deterfl = "Y";
if deterfl = "Y";
keep usubjid paramcd deterfl deterdt ady avisit avisitn day_count DETDESC CNSR;
run;
proc sort data=ADC30_M;
by usubjid paramcd;
run;
data missed_assessments;
set ADC30_M;
by usubjid paramcd;
format last_adt date9.;
retain last_adt missed_count;
if first.usubjid or first.paramcd then do;
last_adt = .;
missed_count = 0;
end;
/* Check for missed assessments (i.e., no record within 48 days) */
if not missing(last_adt) and adt > last_adt + 48 then missed_count + 1;
/* Update last_adt only after checking for missed assessments */
last_adt = adt;
/* Create a flag for records with missed assessments */
if chgcat1 = "Deterioration" and missed_count >= 2 then missed_flag = 1;
else missed_flag = 0;
if last.paramcd then do;
last_adt = .;
missed_count = 0;
end;
run;
proc sort data=missed_assessments;
by usubjid paramcd adt;
run;
data ADC30_M_flagged;
merge ADC30_M(in=inA) missed_assessments(keep=usubjid paramcd adt missed_flag missed_count);
by usubjid paramcd adt;
if inA;
run;
data det1;
set ADC30_M_flagged;
if chgcat1 = "Deterioration" and adt >= randdt;
run;
proc sort data=det1;
by usubjid paramcd adt;
run;
proc sort data=ADC30_M_flagged;
by usubjid paramcd adt;
run;
data det2;
merge det1(in=inDet1) ADC30_M_flagged;
by usubjid paramcd adt;
retain first_deterdt second_deterdt third_deterdt fourth_deterdt fifth_deterdt sixth_deterdt
subsequent_adt1 subsequent_adt2 subsequent_adt3 subsequent_adt4 subsequent_adt5 subsequent_adt6
subsequent_chgcat1 subsequent_chgcat2 subsequent_chgcat3 subsequent_chgcat4 subsequent_chgcat5 subsequent_chgcat6
last_deterdt last_deterdt_prev_chgcat last_deterdt_is_deterioration missed_count deterioration_flag;
array deter_dt[6] first_deterdt second_deterdt third_deterdt fourth_deterdt fifth_deterdt sixth_deterdt;
array subsequent_adt_arr[6] subsequent_adt1 subsequent_adt2 subsequent_adt3 subsequent_adt4 subsequent_adt5 subsequent_adt6;
array subsequent_chgcat_arr[6] $40 subsequent_chgcat1 subsequent_chgcat2 subsequent_chgcat3 subsequent_chgcat4 subsequent_chgcat5 subsequent_chgcat6;
format first_deterdt second_deterdt third_deterdt fourth_deterdt fifth_deterdt sixth_deterdt
subsequent_adt1 subsequent_adt2 subsequent_adt3 subsequent_adt4 subsequent_adt5 subsequent_adt6
last_deterdt date9. subsequent_chgcat1 subsequent_chgcat2 subsequent_chgcat3 subsequent_chgcat4 subsequent_chgcat5 subsequent_chgcat6 $40.
last_deterdt_prev_chgcat $40. last_deterdt_is_deterioration $1. deterioration_flag $1.;
if first.usubjid or first.paramcd then do;
do i = 1 to 6;
deter_dt[i] = .;
subsequent_adt_arr[i] = .;
subsequent_chgcat_arr[i] = "";
end;
last_deterdt = .;
last_deterdt_prev_chgcat = "";
last_deterdt_is_deterioration = "N";
missed_count = 0;
deterioration_flag = "N";
end;
if inDet1 then do;
if deter_dt[1] = . then deter_dt[1] = adt;
end;
/* Capture each subsequent deterioration and its subsequent chgcat1 */
do i = 1 to 6;
if deter_dt[i] ne . and adt >= deter_dt[i] + 14 then do;
if subsequent_adt_arr[i] = . then do;
subsequent_adt_arr[i] = adt;
subsequent_chgcat_arr[i] = chgcat1;
end;
if i < 6 and deter_dt[i + 1] = . and adt >= subsequent_adt_arr[i] + 14 then do;
deter_dt[i + 1] = adt;
end;
end;
end;
/* Check if the last record in the group has chgcat1 = "Deterioration" */
if last.paramcd then do;
if chgcat1 = "Deterioration" then do;
last_deterdt = adt;
last_deterdt_is_deterioration = "Y";
end;
else last_deterdt_is_deterioration = "N";
if last_deterdt ne . and adt <= last_deterdt + 48 and chgcat1 = "Deterioration" then last_deterdt_prev_chgcat = chgcat1;
end;
/* Set flag if current and next chgcat1 are "Deterioration" */
do i = 1 to 6;
if subsequent_chgcat_arr[i] = "Deterioration" then do;
if i < 6 and subsequent_chgcat_arr[i + 1] = "Deterioration" then do;
deterioration_flag = "Y";
end;
else if i = 6 and chgcat1 = "Deterioration" then do;
deterioration_flag = "Y";
end;
end;
end;
if last.paramcd then do;
if not missing(first_deterdt) and not missing(subsequent_adt1) and missed_flag = 1 then do;
output;
end;
do i = 1 to 6;
deter_dt[i] = .;
subsequent_adt_arr[i] = .;
subsequent_chgcat_arr[i] = "";
end;
last_deterdt = .;
last_deterdt_prev_chgcat = "";
last_deterdt_is_deterioration = "N";
missed_count = 0;
deterioration_flag = "N";
end;
run;
proc sort data=det2;
by usubjid paramcd first_deterdt;
run;
data deterioration2;
set det2;
if deterioration_flag ="Y" or last_deterdt_is_deterioration ="Y" then do;
length deterfl2 $1;
deterfl2 = "Y";
output;
end;
*keep usubjid paramcd deterfl2 adt ady avisit avisitn missed_flag last_deterdt_is_deterioration deterioration_flag;
run;
I'm still not understanding everything, but I think I would approach this as a problem where you need to do lags (look-backs) and leads (look-aheads).
I'm a big fan of Mark Keintz's papers on lags and leads, e.g. : https://www.lexjansen.com/wuss/2014/24_Final_Paper_PDF.pdf
I took your first two example steps, and made one example with two subjects:
data have;
length COMMENT $100;
infile datalines dsd dlm='|';
input USUBJID :$10. ADT :date9. PARAMCD :$2. AVAL :8.2 ABLFL :$1. CHGCAT1 :$20. RANDDT :date9. COMMENT :$100.;
format ADT RANDDT date9.;
datalines;
1001|07DEC2022|CO|33.3333333|Y| |01DEC2022|
1001|15JAN2023|CO|100.0||Deterioration|01DEC2022|Use this ADT
1001|06FEB2023|CO|100.0||Deterioration|01DEC2022|confirms initial chgcat1
1001|12APR2023|CO|100.0||Deterioration|01DEC2022|
1001|04MAY2023|CO|100.0||Deterioration|01DEC2022|
1001|08JUL2023|CO|100.0||Deterioration|01DEC2022|
1001|13AUG2023|CO|100.0||Deterioration|01DEC2022|
1001|05SEP2023|CO|100.0||Deterioration|01DEC2022|
1001|26SEP2023|CO|100.0||Deterioration|01DEC2022|
1001|05NOV2023|CO|66.6666667||Deterioration|01DEC2022|
1001|18DEC2023|CO|100.0||Deterioration|01DEC2022|
1001|10JAN2024|CO|66.6666667||Deterioration|01DEC2022|
1001|28JAN2024|CO|100.0||Deterioration|01DEC2022|
2001|07DEC2022|EF|41.6666667|Y|No change|01DEC2022|
2001|15JAN2023|EF|50||No change|01DEC2022|
2001|06FEB2023|EF|66.6666667||Improvement|01DEC2022|
2001|12APR2023|EF|41.6666667||No change|01DEC2022|
2001|04MAY2023|EF|25||No change|01DEC2022|
2001|08JUL2023|EF|25||Deterioration|01DEC2022|No prior record within 48 days|
2001|13AUG2023|EF|33.3333333||Deterioration|01DEC2022|No confirmation on next record|
2001|05SEP2023|EF|33.3333333||No change|01DEC2022|
2001|26SEP2023|EF|33.3333333||No change|01DEC2022|
2001|05NOV2023|EF|33.3333333||No change|01DEC2022|
2001|18DEC2023|EF|25||Deterioration|01DEC2022|No confirmation on next record|
2001|10JAN2024|EF|58.3333333||Improvement|01DEC2022|
2001|28JAN2024|EF|25||Deterioration|01DEC2022|Valid deterioration record as prior evaluable record with 48 days
;
run;
I think your basic logic is something like: Calculate DeteriorationDate if within a subjectid-paramcd:
For that you would need the lead of chgcat1, the lead of adt, and the lag of adt.
You could approach it like below (this assumes data are already sorted chronologically):
data want ;
set have ;
by usubjid paramcd ;
*lead ;
if eof1=0 then set have (firstobs=2 keep=chgcat1 adt rename=(chgcat1=leadchgcat adt=leadadt)) end=eof1;
if last.paramcd then call missing(leadchgcat,leadadt) ;
*lag ;
lagadt=ifn(first.paramcd=0,lag(adt),.) ;
if first.paramcd then call missing(_HasDeteriorationDate) ;
if NOT (_HasDeteriorationDate) then do ;
if (chgcat1=leadchgcat="Deterioration")
and ((leadadt-adt) > 14)
and (.Z < (adt - lagadt) < 48)
then do ;
DeteriorationDate=adt ;
_HasDeteriorationDate++1 ;
end ;
end ;
drop _: ;
format lagadt DeteriorationDate date9. ;
run ;
proc print data=want;
var usubjid paramcd chgcat1 adt leadchgcat leadadt lagadt DeteriorationDate;
by usubjid paramcd ;
id usubjid paramcd ;
run ;
This will flag one DeterioriationDate:
Deterioration USUBJID PARAMCD USUBJID PARAMCD CHGCAT1 ADT leadchgcat leadadt lagadt Date 1001 CO 1001 CO 07DEC2022 Deterioration 15JAN2023 . . 1001 CO Deterioration 15JAN2023 Deterioration 06FEB2023 07DEC2022 15JAN2023 1001 CO Deterioration 06FEB2023 Deterioration 12APR2023 15JAN2023 . 1001 CO Deterioration 12APR2023 Deterioration 04MAY2023 06FEB2023 . 1001 CO Deterioration 04MAY2023 Deterioration 08JUL2023 12APR2023 . 1001 CO Deterioration 08JUL2023 Deterioration 13AUG2023 04MAY2023 . 1001 CO Deterioration 13AUG2023 Deterioration 05SEP2023 08JUL2023 . 1001 CO Deterioration 05SEP2023 Deterioration 26SEP2023 13AUG2023 . 1001 CO Deterioration 26SEP2023 Deterioration 05NOV2023 05SEP2023 . 1001 CO Deterioration 05NOV2023 Deterioration 18DEC2023 26SEP2023 . 1001 CO Deterioration 18DEC2023 Deterioration 10JAN2024 05NOV2023 . 1001 CO Deterioration 10JAN2024 Deterioration 28JAN2024 18DEC2023 . 1001 CO Deterioration 28JAN2024 . 10JAN2024 . 2001 EF 2001 EF No change 07DEC2022 No change 15JAN2023 . . 2001 EF No change 15JAN2023 Improvement 06FEB2023 07DEC2022 . 2001 EF Improvement 06FEB2023 No change 12APR2023 15JAN2023 . 2001 EF No change 12APR2023 No change 04MAY2023 06FEB2023 . 2001 EF No change 04MAY2023 Deterioration 08JUL2023 12APR2023 . 2001 EF Deterioration 08JUL2023 Deterioration 13AUG2023 04MAY2023 . 2001 EF Deterioration 13AUG2023 No change 05SEP2023 08JUL2023 . 2001 EF No change 05SEP2023 No change 26SEP2023 13AUG2023 . 2001 EF No change 26SEP2023 No change 05NOV2023 05SEP2023 . 2001 EF No change 05NOV2023 Deterioration 18DEC2023 26SEP2023 . 2001 EF Deterioration 18DEC2023 Improvement 10JAN2024 05NOV2023 . 2001 EF Improvement 10JAN2024 Deterioration 28JAN2024 18DEC2023 . 2001 EF Deterioration 28JAN2024 . 10JAN2024 .
That is not the answer to your full question. But I think you could solve the full problem by extending the use of lags and leads and adding more logic to the calculation.
Could you post a small amount of example data you HAVE (as a CARDS data step, maybe 10-20 records) and also post the data you WANT as an output? It can be real data or fake data. Usually 10-20 records of fake data with just a few variables is enough to illustrate a problem. Then also describe your logic in terms of the example data. I think that would help. As is, without example data, I'm having a hard time understanding the spec.
data adtteqs;
input USUBJID $ ADT :date9. PARAMCD $ AVAL ABLFL $ CHGCAT1 $ RANDDT :date9.;
format ADT RANDDT date9.;
datalines;
1001 07DEC2022 CO 33.3333333 Y Deterioration 01DEC2022
1001 15JAN2023 CO 100.0 . Deterioration 01DEC2022
1001 06FEB2023 CO 100.0 . Deterioration 01DEC2022
1001 12APR2023 CO 100.0 . Deterioration 01DEC2022
1001 04MAY2023 CO 100.0 . Deterioration 01DEC2022
1001 08JUL2023 CO 100.0 . Deterioration 01DEC2022
1001 13AUG2023 CO 100.0 . Deterioration 01DEC2022
1001 05SEP2023 CO 100.0 . Deterioration 01DEC2022
1001 26SEP2023 CO 100.0 . Deterioration 01DEC2022
1001 05NOV2023 CO 66.6666667 . Deterioration 01DEC2022
1001 18DEC2023 CO 100.0 . Deterioration 01DEC2022
1001 10JAN2024 CO 66.6666667 . Deterioration 01DEC2022
1001 28JAN2024 CO 100.0 . Deterioration 01DEC2022
1001 07DEC2022 DY 0.0 Y No change 01DEC2022
1001 15JAN2023 DY 0.0 . No change 01DEC2022
1001 06FEB2023 DY 0.0 . Deterioration 01DEC2022
1001 12APR2023 DY 33.3333333 . Deterioration 01DEC2022
1001 04MAY2023 DY 0.0 . No change 01DEC2022
1001 08JUL2023 DY 33.3333333 . Deterioration 01DEC2022
1001 13AUG2023 DY 33.3333333 . Deterioration 01DEC2022
1001 05SEP2023 DY 0.0 . No change 01DEC2022
1001 26SEP2023 DY 0.0 . No change 01DEC2022
1001 05NOV2023 DY 0.0 . No change 01DEC2022
1001 18DEC2023 DY 33.3333333 . Deterioration 01DEC2022
1001 10JAN2024 DY 0.0 . No change 01DEC2022
1001 28JAN2024 DY 0.0 . No change 01DEC2022
;
run;
Thanks for the sample data.
Can you describe the desired logic / spec in terms of this data? The spec you posted isn't clear to me. It starts with "4) Else If... "
After describing the logic / spec, can you post the output data you would want to create from your sample data (also as a CARDS data step)?
Hi Quentin. I agree it is not the clearest. so I think generally the adt record we are looking for has to be >= RANDT (Randomisation date). So for the deterioration scenario, if a record (record 1) within a parameter has a chcgat1 value of "Deterioration" and the earliest record immediately following (at least 14 days apart) has a chgcat1 with a value of "Deterioration" then it is a confirmed deterioration ("If it is "No Change" or something other than "Deterioration" then it is essentially null in void) and I would like to create a flag that as I need record 1 ADT to do calculations for AVAL and derive and EVNTDESC ="Deterioration" and a CNSR value of 0. The only additional logic to make it a valid record after this is to make sure that record 1 value of deterioration had a prior evaluable record within 48 days (if there isn't then even if there is a next record chgcat1 ="Deterioration" after the first occurrence it would be also null in void). There could be instances where chgcat1 is deterioration for all records but I just want the first confirmed record.
data adtteqs;
length COMMENT $100;
infile datalines dsd dlm='|';
input USUBJID :$10. ADT :date9. PARAMCD :$2. AVAL :8.2 ABLFL :$1. CHGCAT1 :$20. RANDDT :date9. COMMENT :$100.;
format ADT RANDDT date9.;
datalines;
1001|07DEC2022|CO|33.3333333|Y| |01DEC2022|
1001|15JAN2023|CO|100.0||Deterioration|01DEC2022|Use this ADT
1001|06FEB2023|CO|100.0||Deterioration|01DEC2022|confirms initial chgcat1
1001|12APR2023|CO|100.0||Deterioration|01DEC2022|
1001|04MAY2023|CO|100.0||Deterioration|01DEC2022|
1001|08JUL2023|CO|100.0||Deterioration|01DEC2022|
1001|13AUG2023|CO|100.0||Deterioration|01DEC2022|
1001|05SEP2023|CO|100.0||Deterioration|01DEC2022|
1001|26SEP2023|CO|100.0||Deterioration|01DEC2022|
1001|05NOV2023|CO|66.6666667||Deterioration|01DEC2022|
1001|18DEC2023|CO|100.0||Deterioration|01DEC2022|
1001|10JAN2024|CO|66.6666667||Deterioration|01DEC2022|
1001|28JAN2024|CO|100.0||Deterioration|01DEC2022|
;
run;
Plus you can get scenarios like this
data adtteqs;
length COMMENT $100;
infile datalines dsd dlm='|';
input USUBJID :$10. ADT :date9. ADY PARAMCD :$2. AVAL :8.2 ABLFL :$1. CHGCAT1 :$20. RANDDT :date9. COMMENT :$100.;
format ADT RANDDT date9.;
datalines;
1001|07DEC2022|2|EF|41.6666667|Y|No change|01DEC2022|
1001|15JAN2023|41|EF|50||No change|01DEC2022|
1001|06FEB2023|62|EF|66.6666667||Improvement|01DEC2022|
1001|12APR2023|128|EF|41.6666667||No change|01DEC2022|
1001|04MAY2023|150|EF|25||No change|01DEC2022|
1001|08JUL2023|215|EF|25||Deterioration|01DEC2022|No prior record within 48 days|
1001|13AUG2023|251|EF|33.3333333||Deterioration|01DEC2022|No confirmation on next record|
1001|05SEP2023|274|EF|33.3333333||No change|01DEC2022|
1001|26SEP2023|295|EF|33.3333333||No change|01DEC2022|
1001|05NOV2023|335|EF|33.3333333||No change|01DEC2022|
1001|18DEC2023|378|EF|25||Deterioration|01DEC2022|No confirmation on next record|
1001|10JAN2024|401|EF|58.3333333||Improvement|01DEC2022|
1001|28JAN2024|419|EF|25||Deterioration|01DEC2022|Valid deterioration record as prior evaluable record with 48 days
;
run;
The deterioration after 2 or more missed assessments is similar except if we have a record within a parameter that has a chgcat1 value of "Deterioration" and the next record chgcat1 ="Deterioration" (next visit at least 14 days apart) and the prior evaluable record is greater than or equal to 48 days (which represents 2 or more missed assessments) then we use the ADT from the prior evaluable record that is >= 48 as we will have to censor since they missed too many assessments and cant be used. so then we would have an EVNTDESC =""Deterioration after 2 or more missed assessments" and a CNSR value of 1 but a flag here would be all i need.
data adtteqs;
length COMMENT $100;
infile datalines dsd dlm='|';
input USUBJID :$20. ADT :date9. PARAMCD :$2. AVAL :8.2 ABLFL :$1. CHGCAT1 :$20. COMMENT :$100.;
format ADT date9.;
datalines;
1011|07DEC2022|DY|0|Y|No change|
1011|15JAN2023|DY|0||No change|
1011|06FEB2023|DY|0||Deterioration|
1011|12APR2023|DY|33.3333333||No change|
1011|04MAY2023|DY|0||No change|Date to be used
1011|08JUL2023|DY|33.3333333||Deterioration|prior record greater than 48 days
1011|13AUG2023|DY|33.3333333||Deterioration|confirmation on next record
1011|05SEP2023|DY|0||No change|
1011|26SEP2023|DY|0||No change|
1011|05NOV2023|DY|0||No change|
1011|18DEC2023|DY|33.3333333||Deterioration|
1011|10JAN2024|DY|0||No change|
1011|28JAN2024|DY|0||No change|
;
run;
Also if there is a single assessment of deterioration at the end with no further assessments at the and prior evaluable record is greater than or equal to then we use the prior evaluable date.
data adtteqs;
length COMMENT $100;
infile datalines dsd dlm='|';
input USUBJID :$20. ADT :date9. PARAMCD :$2. AVAL :8.2 ABLFL :$1. CHGCAT1 :$20. COMMENT :$100.;
format ADT date9.;
datalines;
1001|22FEB2022|FI|0|Y|No change|
1001|07MAY2022|FI|0||No change|
1001|01JUN2022|FI|0||No change|
1001|20AUG2022|FI|0||No change|
1001|10SEP2022|FI|0||No change|
1001|03OCT2022|FI|0||No change|
1001|01JAN2023|FI|0||No change|
1001|23JAN2023|FI|0||No change|
1001|16FEB2023|FI|0||No change|
1001|09MAR2023|FI|0||No change|
1001|29MAR2023|FI|0||No change|
1001|18APR2023|FI|33.3333333||Deterioration|
1001|08MAY2023|FI|0||No change|
1001|29MAY2023|FI|0||No change|
1001|21JUN2023|FI|0||No change|
1001|08JUL2023|FI|0||No change|
1001|01AUG2023|FI|0||No change|
1001|22AUG2023|FI|0||No change|
1001|12SEP2023|FI|0||No change|
1001|02OCT2023|FI|0||No change|
1001|25OCT2023|FI|0||No change|
1001|17NOV2023|FI|0||No change|Date to be used (greater than 48 days)
1001|15JAN2024|FI|33.3333333||Deterioration|single deterioration with no more assessments
;
run;
So ideally I would like to retain the correct row with the correct adt for each parameter per subject that satisfies either a "deterioration" or "deterioration after 2 or more missed assessments" with a flag or description variable that indicates which is which.
I'm still not understanding everything, but I think I would approach this as a problem where you need to do lags (look-backs) and leads (look-aheads).
I'm a big fan of Mark Keintz's papers on lags and leads, e.g. : https://www.lexjansen.com/wuss/2014/24_Final_Paper_PDF.pdf
I took your first two example steps, and made one example with two subjects:
data have;
length COMMENT $100;
infile datalines dsd dlm='|';
input USUBJID :$10. ADT :date9. PARAMCD :$2. AVAL :8.2 ABLFL :$1. CHGCAT1 :$20. RANDDT :date9. COMMENT :$100.;
format ADT RANDDT date9.;
datalines;
1001|07DEC2022|CO|33.3333333|Y| |01DEC2022|
1001|15JAN2023|CO|100.0||Deterioration|01DEC2022|Use this ADT
1001|06FEB2023|CO|100.0||Deterioration|01DEC2022|confirms initial chgcat1
1001|12APR2023|CO|100.0||Deterioration|01DEC2022|
1001|04MAY2023|CO|100.0||Deterioration|01DEC2022|
1001|08JUL2023|CO|100.0||Deterioration|01DEC2022|
1001|13AUG2023|CO|100.0||Deterioration|01DEC2022|
1001|05SEP2023|CO|100.0||Deterioration|01DEC2022|
1001|26SEP2023|CO|100.0||Deterioration|01DEC2022|
1001|05NOV2023|CO|66.6666667||Deterioration|01DEC2022|
1001|18DEC2023|CO|100.0||Deterioration|01DEC2022|
1001|10JAN2024|CO|66.6666667||Deterioration|01DEC2022|
1001|28JAN2024|CO|100.0||Deterioration|01DEC2022|
2001|07DEC2022|EF|41.6666667|Y|No change|01DEC2022|
2001|15JAN2023|EF|50||No change|01DEC2022|
2001|06FEB2023|EF|66.6666667||Improvement|01DEC2022|
2001|12APR2023|EF|41.6666667||No change|01DEC2022|
2001|04MAY2023|EF|25||No change|01DEC2022|
2001|08JUL2023|EF|25||Deterioration|01DEC2022|No prior record within 48 days|
2001|13AUG2023|EF|33.3333333||Deterioration|01DEC2022|No confirmation on next record|
2001|05SEP2023|EF|33.3333333||No change|01DEC2022|
2001|26SEP2023|EF|33.3333333||No change|01DEC2022|
2001|05NOV2023|EF|33.3333333||No change|01DEC2022|
2001|18DEC2023|EF|25||Deterioration|01DEC2022|No confirmation on next record|
2001|10JAN2024|EF|58.3333333||Improvement|01DEC2022|
2001|28JAN2024|EF|25||Deterioration|01DEC2022|Valid deterioration record as prior evaluable record with 48 days
;
run;
I think your basic logic is something like: Calculate DeteriorationDate if within a subjectid-paramcd:
For that you would need the lead of chgcat1, the lead of adt, and the lag of adt.
You could approach it like below (this assumes data are already sorted chronologically):
data want ;
set have ;
by usubjid paramcd ;
*lead ;
if eof1=0 then set have (firstobs=2 keep=chgcat1 adt rename=(chgcat1=leadchgcat adt=leadadt)) end=eof1;
if last.paramcd then call missing(leadchgcat,leadadt) ;
*lag ;
lagadt=ifn(first.paramcd=0,lag(adt),.) ;
if first.paramcd then call missing(_HasDeteriorationDate) ;
if NOT (_HasDeteriorationDate) then do ;
if (chgcat1=leadchgcat="Deterioration")
and ((leadadt-adt) > 14)
and (.Z < (adt - lagadt) < 48)
then do ;
DeteriorationDate=adt ;
_HasDeteriorationDate++1 ;
end ;
end ;
drop _: ;
format lagadt DeteriorationDate date9. ;
run ;
proc print data=want;
var usubjid paramcd chgcat1 adt leadchgcat leadadt lagadt DeteriorationDate;
by usubjid paramcd ;
id usubjid paramcd ;
run ;
This will flag one DeterioriationDate:
Deterioration USUBJID PARAMCD USUBJID PARAMCD CHGCAT1 ADT leadchgcat leadadt lagadt Date 1001 CO 1001 CO 07DEC2022 Deterioration 15JAN2023 . . 1001 CO Deterioration 15JAN2023 Deterioration 06FEB2023 07DEC2022 15JAN2023 1001 CO Deterioration 06FEB2023 Deterioration 12APR2023 15JAN2023 . 1001 CO Deterioration 12APR2023 Deterioration 04MAY2023 06FEB2023 . 1001 CO Deterioration 04MAY2023 Deterioration 08JUL2023 12APR2023 . 1001 CO Deterioration 08JUL2023 Deterioration 13AUG2023 04MAY2023 . 1001 CO Deterioration 13AUG2023 Deterioration 05SEP2023 08JUL2023 . 1001 CO Deterioration 05SEP2023 Deterioration 26SEP2023 13AUG2023 . 1001 CO Deterioration 26SEP2023 Deterioration 05NOV2023 05SEP2023 . 1001 CO Deterioration 05NOV2023 Deterioration 18DEC2023 26SEP2023 . 1001 CO Deterioration 18DEC2023 Deterioration 10JAN2024 05NOV2023 . 1001 CO Deterioration 10JAN2024 Deterioration 28JAN2024 18DEC2023 . 1001 CO Deterioration 28JAN2024 . 10JAN2024 . 2001 EF 2001 EF No change 07DEC2022 No change 15JAN2023 . . 2001 EF No change 15JAN2023 Improvement 06FEB2023 07DEC2022 . 2001 EF Improvement 06FEB2023 No change 12APR2023 15JAN2023 . 2001 EF No change 12APR2023 No change 04MAY2023 06FEB2023 . 2001 EF No change 04MAY2023 Deterioration 08JUL2023 12APR2023 . 2001 EF Deterioration 08JUL2023 Deterioration 13AUG2023 04MAY2023 . 2001 EF Deterioration 13AUG2023 No change 05SEP2023 08JUL2023 . 2001 EF No change 05SEP2023 No change 26SEP2023 13AUG2023 . 2001 EF No change 26SEP2023 No change 05NOV2023 05SEP2023 . 2001 EF No change 05NOV2023 Deterioration 18DEC2023 26SEP2023 . 2001 EF Deterioration 18DEC2023 Improvement 10JAN2024 05NOV2023 . 2001 EF Improvement 10JAN2024 Deterioration 28JAN2024 18DEC2023 . 2001 EF Deterioration 28JAN2024 . 10JAN2024 .
That is not the answer to your full question. But I think you could solve the full problem by extending the use of lags and leads and adding more logic to the calculation.
Thanks Quentin I will try this. yeah since my initial post I tried using arrays like this but the results are not consistent . The only thing I noticed for your code was that the last record should be 28JAN2024 for EF as there was no confirmed deterioration records with an evaluable prior visit but since the last record has chgcat1 ="Deterioration" and no further assessments and a prior record lt 48 days it can be considered valid. I think this added layer for both is causing me the most headaches . if I can get this logic i think I can just adjust for the deterioration after 2 or more missed assessments criteria . I need to up my PROC FCMP game 🙂
data deterioration_checks(keep=usubjid paramcd chgcat1 avisit adt days_between_visits first_deterdt next_deterioration confirmed_deterioration prior_evaluable_record first_occurrence_count last_deterioration_flag);
set ADC30_M;
by usubjid paramcd adt;
format first_deterdt date9. days_between_visits 8.;
retain first_deterdt next_deterioration confirmed_deterioration prior_evaluable_record first_occurrence_count last_deterioration_flag;
array adt_array[2] _temporary_;
array chgcat1_array[2] $40 _temporary_;
array aval_array[2] _temporary_;
/* Initialize arrays to hold previous values */
adt_array[1] = adt_array[2];
chgcat1_array[1] = chgcat1_array[2];
aval_array[1] = aval_array[2];
adt_array[2] = adt;
chgcat1_array[2] = chgcat1;
aval_array[2] = aval;
if first.paramcd then do;
first_deterdt = .;
next_deterioration = 0;
confirmed_deterioration = 0;
prior_evaluable_record = 0;
first_occurrence_count = 0;
last_deterioration_flag = 0;
days_between_visits = .;
adt_array[1] = .;
chgcat1_array[1] = "";
aval_array[1] = .;
end;
/* Calculate days between visits */
if not missing(adt_array[1]) then days_between_visits = adt - adt_array[1];
else days_between_visits = .;
/* Check for prior evaluable record within 48 days */
if not missing(aval_array[1]) and days_between_visits <= 48 then prior_evaluable_record = 1;
/* Identify the first occurrence of deterioration */
if chgcat1 = "Deterioration" and adt >= randdt and first_deterdt = . then first_deterdt = adt;
/* Check if the subsequent visit is also a deterioration */
if first_deterdt ne . then do;
if adt >= first_deterdt + 14 and chgcat1 = "Deterioration" then do;
next_deterioration = 1;
confirmed_deterioration = 1;
end;
else if chgcat1 ne "Deterioration" then do;
next_deterioration = 0;
first_deterdt = .; /* Reset if subsequent visit is not deterioration */
end;
end;
/* Count the first occurrence when all conditions are met */
if next_deterioration = 1 and confirmed_deterioration = 1 and prior_evaluable_record = 1 then first_occurrence_count + 1;
/* Mark the last visit for paramcd as deterioration if applicable */
if last.paramcd then do;
if chgcat1 = "Deterioration" and adt >= randdt then do;
last_deterioration_flag = 1;
confirmed_deterioration = 1;
end;
/* Calculate days between visits for the last record */
if not missing(last_adt) then days_between_visits = adt - last_adt;
else days_between_visits = .;
/* Output the last record for the paramcd group */
output;
/* Reset flags for the next paramcd */
first_deterdt = .;
next_deterioration = 0;
confirmed_deterioration = 0;
prior_evaluable_record = 0;
first_occurrence_count = 0;
last_deterioration_flag = 0;
days_between_visits = .;
adt_array[1] = .;
chgcat1_array[1] = "";
aval_array[1] = .;
end;
/* Output the flag for each record */
output;
run;
I think this is it
data deterioration1;
set ADC30_M;
by usubjid paramcd;
retain _HasDeteriorationDateFlag 0;
*lead;
if eof1=0 then set ADC30_M (firstobs=2 keep=chgcat1 adt rename=(chgcat1=leadchgcat adt=leadadt)) end=eof1;
if last.paramcd then call missing(leadchgcat,leadadt);
*lag;
lagadt=ifn(first.paramcd=0,lag(adt),.);
if first.paramcd then do;
call missing(_HasDeteriorationDate);
_HasDeteriorationDateFlag = 0;
end;
if NOT (_HasDeteriorationDate) then do;
if (chgcat1=leadchgcat="Deterioration")
and ((leadadt-adt) > 14)
and (.Z < (adt - lagadt) < 48)
then do;
DeteriorationDate = adt;
_HasDeteriorationDate ++ 1;
_HasDeteriorationDateFlag = 1;
end;
end;
if last.paramcd and _HasDeteriorationDateFlag = 0 and chgcat1 = "Deterioration" then do;
if .Z < (adt - lagadt) < 48 then do;
DeteriorationDate = adt;
end;
end;
drop _: _HasDeteriorationDateFlag;
format lagadt DeteriorationDate date9.;
if not missing(DETERIORATIONDATE) then deterfl = "Y";
if deterfl="Y";
KEEP USUBJID subjid ADT PARAMCD AVAL ABLFL CHGCAT1 RANDDT deterfl LEADADT LEADCHGCAT LAGADT DETERIORATIONDATE;
run;
data deterioration2;
set ADC30_M;
by usubjid paramcd adt;
retain _HasDeteriorationDateFlag 0;
*lead;
if eof1=0 then set ADC30_M (firstobs=2 keep=chgcat1 adt rename=(chgcat1=leadchgcat adt=leadadt)) end=eof1;
if last.paramcd then call missing(leadchgcat,leadadt);
*lag;
lagadt=ifn(first.paramcd=0,lag(adt),.);
if first.paramcd then do;
call missing(_HasDeteriorationDate);
_HasDeteriorationDateFlag = 0;
end;
if ablfl="Y" and missing(lagadt) then lagadt=adt;
if NOT (_HasDeteriorationDate) then do;
if (chgcat1=leadchgcat="Deterioration")
and ((leadadt-adt) > 14)
and (.Z <= (adt - lagadt) >= 48)
then do;
DeteriorationDate = adt;
_HasDeteriorationDate ++ 1;
_HasDeteriorationDateFlag = 1;
end;
end;
if last.paramcd and _HasDeteriorationDateFlag = 0 and chgcat1 = "Deterioration" then do;
if .Z <= (adt - lagadt) >= 48 then do;
DeteriorationDate = LAGadt;
end;
end;
drop _: _HasDeteriorationDateFlag;
format lagadt DeteriorationDate date9.;
if not missing(DETERIORATIONDATE) then deterfl2 = "Y";
if deterfl2="Y";
KEEP USUBJID subjid ADT PARAMCD AVAL ABLFL CHGCAT1 RANDDT deterfl2 LEADADT LEADCHGCAT LAGADT DETERIORATIONDATE;
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.