Programming the statistical procedures from SAS

proc phreg assumptions

Reply
New Contributor
Posts: 2

proc phreg assumptions

Dear All,

 

I have a survival data that is time to an event type data, i want to perform cox hazard proportional hazard model on that.

my doubt is how to check the basic assumptions about proportanality and linearity of covariates in SAS EG because

i have 106 independent variables and around 7 lacs rows.

data consist of all type of variables format ie binary, continous, nominal.

 

Also my data consist of censored rows, so i want to know how survival function estimate is calculated i mean the exact formulla.

 

pls help i am doin project on this and tired of googling..

Community Manager
Posts: 504

Re: proc phreg assumptions

Hi rajan_reliance,

 

Thanks for your question! I've moved your post to the SAS Statistical Procedures community where experts here will be able to help you.

 

Anna

Trusted Advisor
Posts: 2,114

Re: proc phreg assumptions

There is a SAS Book that covers all of this.  It's called

Survival Analysis Using SAS:  A Practical Guide.  By Paul D. Allison

https://www.sas.com/store/books/categories/usage-and-reference/survival-analysis-using-sas-a-practic...

 

It's quite good.  It was published in 2010, so it doesn't include some of the newer features of PHREG.

Super Contributor
Posts: 287

Re: proc phreg assumptions

[ Edited ]

Hello,

 

 

Here is some methods you can use to validate the assumption in the PH-model, but they are all cumbersome when there is many independent variables. You can divide the time axis into pieces (two pieces for instance). Then check interaction between the time and the variable. Its a good idea to aggregate your data on riskset and then use the type3-option to check interaction (see this link). Otherwise you should define the timedependence inside phreg  (which increase calculation alot). You will get a p-value for each of you independent variables.

 

An other way to check the assumption is by using the assess statement. Here you check that the score-process behaves as it should if the assumption is fulfilled. You will get a plot of the score-process for each parameter in the model. The scoreprocess should look like the simulatied curves (see example below). Calculation time is proportional to the square of events, so if you have a large datasets this will no be an option.

 

data simulation;

do i=1 to 500;

t=rand('exponential',10);

group=rand('bernoulli',0.5);

output;

end;

run;

ods html;

ods graphics;

proc phreg data=simulation;

class group/param=glm;

model t=group;

assess ph/npaths=100 resample=1;

run;

ods html close;

 

It is worthwhile to mention the that you can plot log(-log(S1(t)), where S(t) is kaplan meier plots. The plot you get should be constant if the assumption is fulfilled. This method only work with one variable at a time. Here is an example:

 

 

data simulation;

do i=1 to 20000;

group=rand('bernoulli',0.5);

t=rand('exponential',10*exp(-0.5*group));

output;

end;

run;

proc sort data=simulation;

by group;

run;

proc phreg data=simulation;

class group/param=glm;

model t=;

baseline out=baseline survival=survival/ method=pl;

by group;

run;

data dif;

set baseline(in=a0 where=(group=0) rename=(survival=s0))

baseline(in=a1 where=(group=1) rename=(survival=s1));

by t;

retain lasts0 lasts1;

if a0 then s1=lasts1;

else if a1 then s0=lasts0;

dif=log(-log(s0))-log(-log(s1));

if a0 then lasts0=s0;

else if a1 then lasts1=s1;

run;

 

symbol i=join v=none;

goptions device=win;

proc gplot data=dif;

plot dif*t;

run;

 

I will mention that I find it funny that there are so much focus on whether there is proportional hazards. Why is this interaction between some indendent variables and time more important than interaction between the independent variabels. That is because Cox's proporportional hazard model is named after this assumption! For instance, the same assumption is used in Poisson regression model, but reviewers rarely ask to check the proportional hazard if Poisson regression is used.

Ask a Question
Discussion stats
  • 3 replies
  • 283 views
  • 1 like
  • 4 in conversation