Hi-the imagined dataset is below -the goal is tie a specific event to the first date in which something happened-
In this example, the censor events happen in chronological order-either the event happened, death, or EOF (end of follow up).
It takes the first date that a censor event occurred or uses 30 days after the discharge date if no event occurred.
I have proc sql code below that uses when statements. This is fine for 2 events (event1 and death) but this would be cumbersome if there 10-20 events. Not sure if there is a more efficient way in that if time 5 is used it corresponds to event 5 and if all event (and death) dates are null then simply takes the 30 days after the discharge and gives it the proper category of 'eof' or something to that effect.
It can be either datastep or sql-either one.
TIA.
data have;
informat ID 8. Discharge MMDDYY10. EVENT 8. EVENT_DT MMDDYY10. DEATH 8. DEATH_DT MMDDYY10.;
input ID Discharge EVENT EVENT_DT DEATH DEATH_DT ;
format Discharge DDMMYY10. EVENT_DT DDMMYY10. Death_dt DDMMYY10. ;
datalines;
1 01/01/2020 1 01/15/2020 0 01/20/2020
2 01/01/2020 0 . 1 01/17/2020
3 01/01/2020 0 . 0 .
;
run;
proc sql;
select a.*,
min(EVENT_DT,DEATH_DT, intnx('DAY', Discharge,30)) format=date9. as censor_DATE,
case when EVENT=1 then 'EVENT'
when EVENT=0 AND DEATH=1 THEN 'DEATH'
when EVENT=0 AND DEATH=0 THEN 'EOF ' END as censor_event
from have A;
Wouldn't it make more sense to start with the data in more natural configuration?
Something like:
data visits;
input id visit $ date :mmddyy. ;
format date yymmdd10.;
cards;
1 DISCHARGE 01/01/2020
1 EVENT 01/15/2020
1 FOLLOW_UP 01/20/2020
2 DISCHARGE 01/01/2020
2 DEATH 01/17/2020
3 DISCHARGE 01/01/2020
;
Then you could just process it by ID and DATE to figure out whatever it was you wanted to calculate.
Can you please make up an example that has 10 events and explain in words exactly what the desired result is?
Wouldn't it make more sense to start with the data in more natural configuration?
Something like:
data visits;
input id visit $ date :mmddyy. ;
format date yymmdd10.;
cards;
1 DISCHARGE 01/01/2020
1 EVENT 01/15/2020
1 FOLLOW_UP 01/20/2020
2 DISCHARGE 01/01/2020
2 DEATH 01/17/2020
3 DISCHARGE 01/01/2020
;
Then you could just process it by ID and DATE to figure out whatever it was you wanted to calculate.
@Tom -I agree with you but unfortunately the real data is similar to the format I presented with additional other vars. I was actually programming to use a proc contents to get all the date variables and then do a proc transpose to reshape the data in the format you propose when I saw your comments.
I was seeing if there was a more efficient method to not reshape the data. I think not.
Best,
Lawrence
The way to handle multiple variables in a similar way is with ARRAY in a data step.
So you have one array of the FLAG variables that indicate whether or not the event occurred. Another array of the date variables.
I am not sure I understand your logic but if you wanted to calculate some type of right censored survival variable for many different possible events then it should be possible.
Is the goal to calculate the FIRST event type that occurred and if none occurred then use the max follow up date plus 30?
data want;
set have;
array flags event death ;
array dates event_dt death_dt ;
length event_type $32 date_to_use 8;
format date_to_use yymmdd10.;
date_to_use=max(of dates[*]);
do index=1 to dim(flags);
if flags[index] and dates[index]<date_to_use then do;
event_type=vname(flags[index]);
date_to_use=dates[index];
end;
end;
censored= (event_type=' ');
if censored then data_to_use=date_to_use +30;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.