05-05-2014 09:41 PM
I have performed a Cox regression and drawn up some Kaplan-Meier curves and life tables for a very large sample survival analysis (N~120,000) with censoring.
The following chart shows one of the hazard rate charts stratified by age group. As you can see, three groups have a distinct U-shape whereas the other two groups do not.
I am unsure whether this is a clear indication of the violation of proportional hazard assumptions. I have also followed the SAS documentation to test assumptions however due to my very large sample things are unclear, some time dependent tests are significant however the other charts show no violation at all. This is basically the only chart that provides any indication that things might be awry.
(FYI In the regression itself, age is not grouped).
05-06-2014 12:01 AM
Couple of things come to mind, are the groups equally balanced in terms of numbers?
Does it make logical sense to expect these groups to have different event rates or is something else going on?
05-06-2014 12:20 AM
No the groups are not equally balanced (observational study).
This study is of children - the age cuts off at 18. We omitted children aged 16 and 17 from this analysis in order for the remaining children to have an opportunity for the event to occur (roughly coinciding with the follow up time). We expect that age is negatively correlated with event occurrence.
I suppose I'm curious as to whether there is a data issue or the function illustrated can be legitimate.
05-06-2014 11:14 AM
I'm not sure that I have understood this studydesign. I would like to know more about what the underlying time axis is, and how you have stratified. can you show the phreg-code?
The basic rule in survival-analysis is that no conditioning on future is allowed. If you have age as underlying time then omitting children aged 16 and 17 is not legal. That is so because children not aged 16 yet can not be determined whether they should enter or not. Instead I suggest to censor out children when they are 16 years old.
05-06-2014 08:02 PM
Thanks, no we do not have age as the time variable, age at baseline is included as a covariate. I've been asked to analyse time to second event and the request has been for me to perform a cox regression. I have been hesitant to do so because I have concern about how age influences the outcome, however I have been asked to demonstrate how the model is inadequate. Given the large sample I am finding it difficult to establish that there is a large enough violation to justify not performing the regression. For example the log-log plots are parallel however show some very minor crossing at the start and end of the period - my boss felt it wasn't a large enough violation however I am a little more pedantic about these things. So I am trying to establish whether the hazard rate (shown above) is also a good indication of violation of assumptions.
Var1= binary race variable
Var2 = binary variable indicating prior exposure to a different program
event2censored = 1 if censored 0 if not censored
|proc phreg data = derived.analysis|
class var1 var2;
model timetoevent2*Event2censored(1) = baselineage var1 var2/ rl;
where baselineage < 16;