LOCF is easy to implement using the UDPATE statement.
I don't like to re-type long list of variable names so let's put that list into a macro variable.
%let keys=subject aesoccalc aeptcalc
temporary_suppae_qval_ctcae temporary_ae_aeterm
;
proc sort data=sample_input out=step1;
BY &keys aestdat visno recordid ;
run;
data step2;
update step1(obs=0) step1;
by &keys aestdat;
output;
run;
But that will not back fill the missing dates that appeared first in a group.
You could fix that by changing the SORT ORDER so a non missing end date (if one exists) appears on the first observation.
proc sort data=sample_input out=step1;
by &keys aestdat descending aeendat ;
run;
If there are other variables (like the VISNO and RECORDID variable you mentioned) that you do not want carried forward then add an extra SET statement to re-read the observation after the UPDATE and before the OUTPUT.
For example to only carry forward AEENDAT you could use:
data step2;
update step1(obs=0) step1;
by &keys aestdat;
set step1(drop=aeendat);
output;
run;
Note the LOCF logic will not handle the cases where there are multiple different AEENDAT values for the same start date. For that you need to summarize first and then use the new summary variable instead of the original variable. So assuming you want to use the MAX AEENDAT in that case your code for step2 might look like this.
data step2;
do until(last.aestdat);
set step1;
by &keys aestdat;
max_end = max(max_end,aeendat);
end;
do until(last.aestdat);
set step1;
by &keys aestdat;
aeendat=max_end;
output;
end;
drop max_end;
run;
Once you have consistent AESTDAT and AEENDAT values then calculating the AEGROUP variable is simple and does not require any look ahead.
Just increment the AEGROUP variable when you see that AESTDAT is larger than the previous AEENDAT value. Be careful of the cases with only missing values of AEENDAT. Also do not get confused by end dates from a previous AE term. Reset the counter for each new subject.
data step5 ;
set step2 ;
by &keys aestdat;
if first.temporary_ae_aeterm
or (aestdat>lag(aeendat)>.Z) then aegroup+1;
if first.subject then aegroup=1;
run;
Now finding the first date and flagging the last observation is simple.
data want;
set step5;
by &keys aegroup ;
length lasteventfl $1 firstdt 8;
if first.aegroup then firstdt = aestdat;
if last.aegroup then lasteventfl='Y';
else lasteventfl='N';
retain firstdt;
format firstdt date9.;
run;
So running that on your example
data sample_input ;
infile datalines dsd truncover;
input subject :$10. aesoccalc :$50. aeptcalc :$50.
temporary_suppae_qval_ctcae :$50. temporary_ae_aeterm :$50.
visno :$20. aestdat :date. aeendat :date. recordid
;
format aestdat date9. aeendat date9. ;
datalines4;
001,Infections,Skin infection,Skin infection,SKIN INFECTION,Cycle 08,15OCT2023,,1
001,Infections,Skin infection,Skin infection,SKIN INFECTION,Cycle 15,15OCT2023,,2
001,Infections,Skin infection,Skin infection,SKIN INFECTION,Cycle 28,15OCT2023,,3
002,Skin disorders,Rash,Rash maculo-papular,MACULO-PAPULAR RASH,Cycle 02,22APR2023,15MAY2023,4
002,Skin disorders,Rash,Rash maculo-papular,MACULO-PAPULAR RASH,Cycle 03,22APR2023,15MAY2023,5
002,Skin disorders,Rash,Rash maculo-papular,MACULO-PAPULAR RASH,Cycle 13,15MAY2023,25MAY2023,6
002,Skin disorders,Rash,Rash maculo-papular,MACULO-PAPULAR RASH,Cycle 14,08APR2024,28DEC2024,7
003,GI disorders,Nausea,Nausea,NAUSEA,Cycle 01,01MAR2024,28MAY2024,8
003,GI disorders,Nausea,Nausea,NAUSEA,Cycle 02,28MAY2024,30JUN2024,9
;;;;
data expected_output ;
infile datalines dsd truncover;
input subject :$10. aesoccalc :$50. aeptcalc :$50.
temporary_suppae_qval_ctcae :$50. temporary_ae_aeterm :$50.
visno :$20. aestdat :date. aeendat :date. recordid lasteventfl :$1.
firstdt :date. aegroup
;
format aestdat date9. aeendat date9. firstdt date9. ;
datalines4;
001,Infections,Skin infection,Skin infection,SKIN INFECTION,Cycle 08,15OCT2023,,1,N,15OCT2023,1
001,Infections,Skin infection,Skin infection,SKIN INFECTION,Cycle 15,15OCT2023,,2,N,15OCT2023,1
001,Infections,Skin infection,Skin infection,SKIN INFECTION,Cycle 28,15OCT2023,,3,Y,15OCT2023,1
002,Skin disorders,Rash,Rash maculo-papular,MACULO-PAPULAR RASH,Cycle 02,22APR2023,15MAY2023,4,N,22APR2023,1
002,Skin disorders,Rash,Rash maculo-papular,MACULO-PAPULAR RASH,Cycle 03,22APR2023,15MAY2023,5,N,22APR2023,1
002,Skin disorders,Rash,Rash maculo-papular,MACULO-PAPULAR RASH,Cycle 13,15MAY2023,25MAY2023,6,Y,15MAY2023,2
002,Skin disorders,Rash,Rash maculo-papular,MACULO-PAPULAR RASH,Cycle 14,08APR2024,28DEC2024,7,Y,08APR2024,3
003,GI disorders,Nausea,Nausea,NAUSEA,Cycle 01,01MAR2024,28MAY2024,8,N,01MAR2024,1
003,GI disorders,Nausea,Nausea,NAUSEA,Cycle 02,28MAY2024,30JUN2024,9,Y,01MAR2024,1
;;;;
%let keys=subject aesoccalc aeptcalc
temporary_suppae_qval_ctcae temporary_ae_aeterm
;
* Step 1;
proc sort data=sample_input out=step1;
by &keys aestdat visno recordid ;
run;
* Step 2;
data step2;
do until(last.aestdat);
set step1;
by &keys aestdat;
max_end = max(max_end,aeendat);
end;
do until(last.aestdat);
set step1;
by &keys aestdat;
aeendat=max_end;
output;
end;
drop max_end;
run;
* Step 5;
data step5 ;
set step2 ;
by &keys aestdat;
if first.temporary_ae_aeterm
or (aestdat>lag(aeendat)>.Z) then aegroup+1;
if first.subject then aegroup=1;
run;
* Step 3 and 4 ;
data want;
set step5;
by &keys aegroup ;
length lasteventfl $1 firstdt 8;
if first.aegroup then firstdt = aestdat;
if last.aegroup then lasteventfl='Y';
else lasteventfl='N';
retain firstdt;
format firstdt date9.;
run;
proc print data=want(drop=aesoccalc aeptcalc temporary_suppae_qval_ctcae ); run;
proc print data=expected_output(drop=aesoccalc aeptcalc temporary_suppae_qval_ctcae ); run;
proc compare data=want compare=expected_output;
run;
finds one difference with your expected output.
For SUBJECT 003 your expected results collapsed records 8 and 9 into one group. Why? Is it because the start and end dates are the same? But you did not that for the similar situation in SUBJECT 002 between records 5 and 6.
If you don't want to collapse when the start = previous end then just change the test:
if first.temporary_ae_aeterm
or (aestdat+1>lag(aeendat)>.Z) then aegroup+1;
Results with adjacent records NOT collapsed:
... View more