- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Let's suppose I have fitted a model using PROC PHREG, and I now want to compute the survival probabilities. Why do I need to compute them if they are output by SAS? Because we will need to compute the fitted model predictions of survival probabilities on new data, and the way our computer systems are set up, I can't just simply turn over the SAS and say use PROC PLM, the calculation of survival probabilities will be done outside of SAS, so I need to have this other computer system replicate the exact formula.
I get the feeling I'm missing something simple, but I can't figure it out.
So, I make up some stupid data, run PROC PHREG and get a stupid model on the stupid data, obtain the survival probs computed by PROC PHREG, and then can someone guide me through the calculations?
Here's the made up data:
data class;
set sashelp.class;
call streaminit(02785);
bad_rand=rand('uniform');
if bad_rand>0.6 then bad=1;
else bad=0;
if bad=1 then time_to_bad=rand('integer',5,10);
else time_to_bad=10;
run;
Here's PROC PHREG using just one x-variable to make it simple.
proc phreg data=class;
ods output parameterestimates=parms;
model time_to_bad*bad(0)=height;
baseline covariates=class out=baseline cumhaz=cumhaz logsurv=logsurv timelist=5 to 10 by 1 xbeta=xbeta;
output out=survival survival=survival ;
run;
And now here's the output in data set SURVIVAL.
Here is the model output
Here is additional info in the data set BASELINE, I don't know if this is useful to us here.
How is the survival probability for Alfred of 0.914562597 at time_to_bad=6 computed from the model outputs?
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
If you want get it done outside of SAS, then you need to know the algorithm of this model very details and very well.
I think that is a big obstacle for most of use.
Otherwise,you don't need SAS anymore, you could build survival model by Java,Python,C,C#........ or other languages.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
The Cox model is directly modeling hazards, not probabilities, so calculation of survival probabilities from Cox models is a bit difficult.
More specifically, this calculation may entail specifiying the baseline survival function. I have seen procedures of adopting a Weibull model as the baseline survival function. See section 20.7.4 (or more specifically, page 422) of Clinical Prediction Models: A Practical Approach to Development, Validation, and Updating | Springer....
But admittedly, this process was only mentioned briefly in the material I cited. The author provided no explanation of the rationale of choosing the Weibull model as the baseline survival function. However, this part was in fact a summarization of one of the author's colleague's previous work, which has already been published: Validation, calibration, revision and combination of prognostic survival models - van Houwelingen - ....
By the way, there is a monograph on building survival models (e.g., Cox models) for prediction of issues like survival probabilities. This is in fact the only monograph on this very specific topic I have found: Dynamic Prediction in Clinical Survival Analysis (Chapman & Hall/CRC Monographs on Statistics and Ap....
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thanks, I will read these documents. UPDATE: I guess I have to buy the books first.
I am confused by this:
The Cox model is directly modeling hazards, not probabilities, so calculation of survival probabilities from Cox models is a bit difficult.
Clearly, SAS is computing survival probabilities, which is why I asked the question in the first place.
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
The formulas for the survival probabilities are given in the Details => Survivor Function Estimators section of the PROC PHREG documentation. These formulas are not easily replicated outside of PHREG and we rarely, if ever, get asked about them. The only one that seems tractable is for METHOD=PL (product limit). This estimate is S(t) = So(t)**exp(XBETA) where So(t) is the baseline survival function (more on this later) and XBETA is the linear predictor XBETA = b1*x1 + b2*x2 + . . . . + bk*xk. The bi are the coefficients and xi are the covariate values for a particular observation.
The baseline survival function So(t) can be had using the BASELINE statement and OUT= data set where the COVARIATES= data set/option has a zero value for all of the covariates. Then you can manually put together S(t) = So(t)**exp(XBETA).
Note that the BASELINE statement is typically used in conjunction with the PLOTS= option to produce survival curves. These curves are produced at the settings given in the COVARIATES= data set. The COVARIATES= data set typically only has a few observations used for plotting particular survival curves. See Example xx.8 Survival Curves in the PHREG documentation (Examples section) for an illustration. The BASELINE statement produces estimates at the event times in the data.
Also note that if you are using the model (t1, t2) counting-process syntax then the BASELINE statement enforces METHOD=Breslow regardless of what METHOD= option you specify. I think a NOTE is written to the log in this case.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content