I would like to understand what it means (in the clinical sense and from a statistics perspective) when you run a K-M analysis by stratifying on the censoring variable. E.g.
Proc lifetest;
time PFS*censor (1);
strata censor ;
run;
I am asking this question because the stratifying variable in my analysis is highly correlated with the censor variable. I would like to gather enough talking points to discourage the approach. It would be great to hear your thoughts and opinions.
Thank you!
It depends on what you're analyzing or trying to measure.
If you're looking at survival times, that's exactly why you use censoring/survival analysis.
Not including the censored records shows a very different pattern, which is why it's dangerous to only use 'complete' information.
To get a genuine picture of the situation you also need to account for the fact that followup is incomplete or whatever reason there is for censoring. There's nothing wrong with 'looking' at it this way, but its not how you analyze the information to understand your survival times. It also matters how much censoring you have...if the censoring is more than 25% you start to have questionable results in my experience. If the censored observations show a different pattern that's even more concerning because it means there's some difference in survival and why these records are being censored would be of interest. It usually means there's something wrong with the treatment or service. If 30% can't complete the treatment protocol and you analyze looking only at people who complete a treatment protocol that can be very misleading.
From a mathematical point of view, it doesnt make sense to stratify on the cencor variable. That is because it introducing some conditioning on future. The Kaplan Meier curve will show probabilities that an event happens before time t conditionen that the event happens some time in future before the person is censored. It will surely lead to overestimation of the probability that events happens.
Thanks to both of you for the response and explanation
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.