Hi SAS communities,
I am performing a survival analysis where the event of interest is outpatient prescription and competing risks include death, admission to hospital, and admission to long-term care. I have been trying to write a code to find the follow-up time for each individual in the study; however, some individuals experience more than one event, and I want to calculate time from the earliest event experienced, whether it be the event of interest or a competing risk.
This is a simplified version of the dataset:
ID | index_date | prescription | service_date | death_date | hosp_admdate | LTC_admdate | death | hosp | LTC | censor | censdate |
001 | 09Sep2019 | 0 | . | 16Dec2019 | . | . | 1 | 0 | 0 | ||
002 | 01Feb2019 | 1 | 19Jun2019 | 06Jul2019 | 08May2019 | . | 1 | 1 | 0 | ||
003 | 06Dec2020 | 0 | . | . | . | . | 0 | 0 | 0 | ||
004 | 01Mar2020 | 1 | 05Mar2020 | . | . | 09Feb2020 | 0 | 0 | 1 | ||
005 | 02Dec2018 | 0 | 09Feb2019 | . | 01Mar2019 | . | 0 | 1 | 0 |
Service_date: date of prescription. End of study date: 10Mar2020.
I found a code online which creates a censor variable and censdate based on the event, I am just not sure how to modify this code so that it captures the multiple competing risks and takes only the earliest event as the censdate.
You should provide an example of what you want as output for your given example data.
Earliest date, assuming you have SAS date values, is to use Min(death_date, hosp_admdate,ltc_admdate)
If you want the interval between the earliest event and Index_date then
interval = Min(death_date, hosp_admdate,ltc_admdate) - index_date;
I was hoping to obtain some sort of variable that identifies the event that occurred first for the individual. For example, if death occurred first then censor=1, if hospitalization occurred first then censor=2, and if both death and hospitalization occurred but death occurred first then censor=1. I also wanted a variable that identifies the censoring date that corresponds to the date the earliest event occurred. I was then hoping to calculate follow-up time by doing censdate-index_date for each subject.
I will repeat: please show what you expect for your example data.
Your "censoring date that corresponds to the date the earliest event " is the MIN of those variables. The Min function will only return missing if all the variables have missing values.
There is a WHICHN, and for character values, WHICHC, that returns a position of a value from a given list. So that can be used. But we need to see just what you expect to see in the data set to show example code.
I have to say that I feel like you are missing something when you state: ". For example, if death occurred first then censor=1, if hospitalization occurred first then censor=2, and if both death and hospitalization occurred but death occurred first then censor=1." There is nothing in the "if both" that modifies the result given by the first two values. Plus you said you have 3 dates/ events. Do I have guess that the third event is Censor=3?? Guessing is a poor way to program.
This is what I am hoping to see in my data:
ID | index_date | prescription | service_date | death_date | hosp_admdate | LTC_admdate | death | hosp | LTC | censor | censdate |
001 | 09Sep2019 | 0 | . | 16Dec2019 | . | . | 1 | 0 | 0 | 1 | 16Dec2019 |
002 | 01Feb2019 | 1 | 19Jun2019 | 06Jul2019 | 08May2019 | . | 1 | 1 | 0 | 2 | 08May2019 |
003 | 06Dec2020 | 0 | . | . | . | . | 0 | 0 | 0 | 4 | 10Mar2020 |
004 | 01Mar2020 | 1 | 05Mar2020 | . | . | 09Feb2020 | 0 | 0 | 1 | 3 | 09Feb2020 |
005 | 02Dec2018 | 1 | 09Feb2019 | . | 01Mar2019 | . | 0 | 1 | 0 | 0 | 09Feb2019 |
Service_date: date of prescription. End of study date: 10Mar2020.
If first event=prescription, censor=0 and censdate=service_date.
If first event=death, censor=1 and censdate=death_date.
If first event=hospitalization, censor=2 and censdate=hosp_admdate.
If first event=LTC, censor=3 and censdate=LTC_admdate.
If no events were experienced and subjects were followed until end of study, censor=4 and censdate=end of study date.
I hope this clears things up!!
You should present your example data as a data step, like this:
data have;
length ID $3
index_date
prescription
service_date
death_date
hosp_admdate
LTC_admdate
death
hosp
LTC 8;
informat
index_date
service_date
death_date
hosp_admdate
LTC_admdate date9.;
format
index_date
service_date
death_date
hosp_admdate
LTC_admdate date9.;
infile cards truncover;
input ID--LTC;
cards;
001 09Sep2019 0 . 16Dec2019 . . 1 0 0
002 01Feb2019 1 19Jun2019 06Jul2019 08May2019 . 1 1 0
003 06Dec2020 0 . . . . 0 0 0
004 01Mar2020 1 05Mar2020 . . 09Feb2020 0 0 1
005 02Dec2018 0 09Feb2019 . 01Mar2019 . 0 1 0
;run;
You can get the result you want like this:
data want;
set have;
censdate=min(service_date,death_date,hosp_admdate,LTC_admdate);
if missing(censdate) then do;
censor=4;
censdate='10mar2020'd; /* end of study */
end;
else if censdate=death_date then
censor=1;
else if censdate=hosp_admdate then
censor=2;
else if censdate=LTC_admdate then
censor=3;
else censor=0;
format censdate date9.;
run;
You did not describe what to do if e.g. both hospital admission and death happen on the same date - in this case death will be the end result (censor=1), as that comes first in the code. If you want different priorities, you will have to rearrange the code a bit.
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.