BookmarkSubscribeRSS Feed
Dylan1
Calcite | Level 5

I used the proc phreg command to estimate the significant covariates for a Cox Model. I got the significant covariates together with the hazard ratios. How then do I estimate the baseline hazard function I may use to calculate the hazard rate and estimate survival probabilities on new data. I tried using the baseline command but it is just giving me different cumulative baseline hazards at different times. I need a function that is dependent on time to use on new data to calculate survival probabilities.

6 REPLIES 6
Ksharp
Super User

Could not use option XBETA= to get linear estimated value and EXP it to get hazardratio ?

 

data covs;
/*format gender gender.;*/
input gender age;
datalines;
0 69.845947
1 69.845947
;
run;

proc phreg data = whas500 plots(overlay)=(survival);
class gender;
model lenfol*fstat(0) = gender age;
baseline covariates=covs out=base xbeta=xbeta / rowid=gender;
run;

data base;
set base;
hazardratio=exp(xbeta);
run;
Dylan1
Calcite | Level 5
Thank you for the response and how I can calculate the hazard ratios. But its not clear for me as yet in how I then derive the baseline hazard function I then combine with the linear predictors to make future predictions on new data.
SAS_Rob
SAS Employee

Without more details, it is hard to say whether you can do this or not.  It depends on what you mean when you say you want to predict for new times.

 

Are you looking to predict at new survival times, that is, extrapolate beyond the event times already contained in your data?  If so, the proportional hazards model is not an appropriate model to do this.  It would simply extend the survival time beyond the last time in the model.

 

If you have a "new" time within the range of the event times used to build the model, then you could estimate the survival at these "new times" by adding these new observations to the DATA= dataset.  You can define a new variable on the DATA= dataset to be used in the FREQ statement of PHREG.  This variable would take on a value of zero (FREQ=0) for the new observations and also assigns the original times a value of one (FREQ=1).  Note that you will, of course, need to provide covariate values for the new observations.  The survival estimates for these new observations will be in the OUTPUT data set.

Dylan1
Calcite | Level 5

Thank you Rob. So given the Baseline command is computationally expensive , is there a way I can get the baseline survival probabilities for all the records I have in my dataset. The one I am using has 50000 records, with 60 different unique times to the event . But the Baseline command is only giving me the survival probabilities at just 45 of those 60 times.

 

The question is , is there a way of having an output of all the survival probabilities for the whole dataset in one go. 

SAS_Rob
SAS Employee

It would be helpful to see your actual code, especially if you have used the TIMELIST= option.

By default, PHREG will give you the survival probabilities for each of the unique event times and not just the number of unique times in your data.  How many unique event times do you have?

Season
Barite | Level 11

I think some data post-processing needs to be conducted to get what you want. Now that SAS has obtained the cumulative hazard function levels at different time points, you can subtract the cumulative hazard function right before a given time from the one at the given time to obtain the estimated baseline hazard function of that time.

In formulae, suppose we are interested in estimating h0(t1) and t0 is the latest time point of an event right before t1. Let the cumulative hazard function estimators output by SAS be denoted as H(.)_hat. Then, what you get from SAS is H(t1)_hat and H(t0)_hat. Use the XBETA= option in the BASELINE statement to get the linear combinations of predictors. Then, you can get H0(t1)_hat=H(t1)_hat/exp(XBETA) and H0(t0)_hat=H(t0)_hat/exp(XBETA). Therefore, h0(t1)_hat=H0(t1)-H0(t0).

This method works fine in the absence of ties. But, when there are, which is a common scenario, then I think directly computing the Breslow estimator of the baseline hazard function can be more helpful.

Season_0-1748674781839.png

In the formula, di is an indicator variable of whether an observation experienced an event at ti. R(ti) is the risk set of ti. In other words, the denominator is the sum of linear combinations of predictors (i.e., XBETA's) of all subjects in the risk set of ti.

hackathon24-white-horiz.png

The 2025 SAS Hackathon Kicks Off on June 11!

Watch the live Hackathon Kickoff to get all the essential information about the SAS Hackathon—including how to join, how to participate, and expert tips for success.

YouTube LinkedIn

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 6 replies
  • 878 views
  • 0 likes
  • 4 in conversation