10-14-2015 10:14 AM
I have a survival data that is time to an event type data, i want to perform cox hazard proportional hazard model on that.
my doubt is how to check the basic assumptions about proportanality and linearity of covariates in SAS EG because
i have 106 independent variables and around 7 lacs rows.
data consist of all type of variables format ie binary, continous, nominal.
Also my data consist of censored rows, so i want to know how survival function estimate is calculated i mean the exact formulla.
pls help i am doin project on this and tired of googling..
10-15-2015 09:34 AM
There is a SAS Book that covers all of this. It's called
Survival Analysis Using SAS: A Practical Guide. By Paul D. Allison
It's quite good. It was published in 2010, so it doesn't include some of the newer features of PHREG.
10-15-2015 11:45 AM - edited 10-16-2015 11:10 AM
Here is some methods you can use to validate the assumption in the PH-model, but they are all cumbersome when there is many independent variables. You can divide the time axis into pieces (two pieces for instance). Then check interaction between the time and the variable. Its a good idea to aggregate your data on riskset and then use the type3-option to check interaction (see this link). Otherwise you should define the timedependence inside phreg (which increase calculation alot). You will get a p-value for each of you independent variables.
An other way to check the assumption is by using the assess statement. Here you check that the score-process behaves as it should if the assumption is fulfilled. You will get a plot of the score-process for each parameter in the model. The scoreprocess should look like the simulatied curves (see example below). Calculation time is proportional to the square of events, so if you have a large datasets this will no be an option.
do i=1 to 500;
proc phreg data=simulation;
assess ph/npaths=100 resample=1;
ods html close;
It is worthwhile to mention the that you can plot log(-log(S1(t)), where S(t) is kaplan meier plots. The plot you get should be constant if the assumption is fulfilled. This method only work with one variable at a time. Here is an example:
do i=1 to 20000;
proc sort data=simulation;
proc phreg data=simulation;
baseline out=baseline survival=survival/ method=pl;
set baseline(in=a0 where=(group=0) rename=(survival=s0))
baseline(in=a1 where=(group=1) rename=(survival=s1));
retain lasts0 lasts1;
if a0 then s1=lasts1;
else if a1 then s0=lasts0;
if a0 then lasts0=s0;
else if a1 then lasts1=s1;
symbol i=join v=none;
proc gplot data=dif;
I will mention that I find it funny that there are so much focus on whether there is proportional hazards. Why is this interaction between some indendent variables and time more important than interaction between the independent variabels. That is because Cox's proporportional hazard model is named after this assumption! For instance, the same assumption is used in Poisson regression model, but reviewers rarely ask to check the proportional hazard if Poisson regression is used.