BookmarkSubscribeRSS Feed
rajan_reliance
Calcite | Level 5

Dear All,

 

I have a survival data that is time to an event type data, i want to perform cox hazard proportional hazard model on that.

my doubt is how to check the basic assumptions about proportanality and linearity of covariates in SAS EG because

i have 106 independent variables and around 7 lacs rows.

data consist of all type of variables format ie binary, continous, nominal.

 

Also my data consist of censored rows, so i want to know how survival function estimate is calculated i mean the exact formulla.

 

pls help i am doin project on this and tired of googling..

3 REPLIES 3
AnnaBrown
Community Manager

Hi rajan_reliance,

 

Thanks for your question! I've moved your post to the SAS Statistical Procedures community where experts here will be able to help you.

 

Anna

Doc_Duke
Rhodochrosite | Level 12

There is a SAS Book that covers all of this.  It's called

Survival Analysis Using SAS:  A Practical Guide.  By Paul D. Allison

https://www.sas.com/store/books/categories/usage-and-reference/survival-analysis-using-sas-a-practic...

 

It's quite good.  It was published in 2010, so it doesn't include some of the newer features of PHREG.

JacobSimonsen
Barite | Level 11

Hello,

 

 

Here is some methods you can use to validate the assumption in the PH-model, but they are all cumbersome when there is many independent variables. You can divide the time axis into pieces (two pieces for instance). Then check interaction between the time and the variable. Its a good idea to aggregate your data on riskset and then use the type3-option to check interaction (see this link). Otherwise you should define the timedependence inside phreg  (which increase calculation alot). You will get a p-value for each of you independent variables.

 

An other way to check the assumption is by using the assess statement. Here you check that the score-process behaves as it should if the assumption is fulfilled. You will get a plot of the score-process for each parameter in the model. The scoreprocess should look like the simulatied curves (see example below). Calculation time is proportional to the square of events, so if you have a large datasets this will no be an option.

 

data simulation;

do i=1 to 500;

t=rand('exponential',10);

group=rand('bernoulli',0.5);

output;

end;

run;

ods html;

ods graphics;

proc phreg data=simulation;

class group/param=glm;

model t=group;

assess ph/npaths=100 resample=1;

run;

ods html close;

 

It is worthwhile to mention the that you can plot log(-log(S1(t)), where S(t) is kaplan meier plots. The plot you get should be constant if the assumption is fulfilled. This method only work with one variable at a time. Here is an example:

 

 

data simulation;

do i=1 to 20000;

group=rand('bernoulli',0.5);

t=rand('exponential',10*exp(-0.5*group));

output;

end;

run;

proc sort data=simulation;

by group;

run;

proc phreg data=simulation;

class group/param=glm;

model t=;

baseline out=baseline survival=survival/ method=pl;

by group;

run;

data dif;

set baseline(in=a0 where=(group=0) rename=(survival=s0))

baseline(in=a1 where=(group=1) rename=(survival=s1));

by t;

retain lasts0 lasts1;

if a0 then s1=lasts1;

else if a1 then s0=lasts0;

dif=log(-log(s0))-log(-log(s1));

if a0 then lasts0=s0;

else if a1 then lasts1=s1;

run;

 

symbol i=join v=none;

goptions device=win;

proc gplot data=dif;

plot dif*t;

run;

 

I will mention that I find it funny that there are so much focus on whether there is proportional hazards. Why is this interaction between some indendent variables and time more important than interaction between the independent variabels. That is because Cox's proporportional hazard model is named after this assumption! For instance, the same assumption is used in Poisson regression model, but reviewers rarely ask to check the proportional hazard if Poisson regression is used.

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 2253 views
  • 1 like
  • 4 in conversation