Statistical Procedures

Programming the statistical procedures from SAS
BookmarkSubscribeRSS Feed
PaigeMiller
Diamond | Level 26

Let's suppose I have fitted a model using PROC PHREG, and I now want to compute the survival probabilities. Why do I need to compute them if they are output by SAS? Because we will need to compute the fitted model predictions of survival probabilities on new data, and the way our computer systems are set up, I can't just simply turn over the SAS and say use PROC PLM, the calculation of survival probabilities will be done outside of SAS, so I need to have this other computer system replicate the exact formula. 

 

I get the feeling I'm missing something simple, but I can't figure it out. 

 

So, I make up some stupid data, run PROC PHREG and get a stupid model on the stupid data, obtain the survival probs computed by PROC PHREG, and then can someone guide me through the calculations?

 

Here's the made up data:

 

data class;
    set sashelp.class;
    call streaminit(02785);
    bad_rand=rand('uniform');
    if bad_rand>0.6 then bad=1;
    else bad=0;
    if bad=1 then time_to_bad=rand('integer',5,10);
    else time_to_bad=10;
run;

 

Here's PROC PHREG using just one x-variable to make it simple.

 

proc phreg data=class;
    ods output parameterestimates=parms;
    model time_to_bad*bad(0)=height;
    baseline covariates=class out=baseline cumhaz=cumhaz logsurv=logsurv timelist=5 to 10 by 1 xbeta=xbeta;
    output out=survival survival=survival ;
run;


And now here's the output in data set SURVIVAL.

 

PaigeMiller_0-1731514562179.png

 

Here is the model output

 

PaigeMiller_1-1731514619013.png

 

Here is additional info in the data set BASELINE, I don't know if this is useful to us here.

 

PaigeMiller_2-1731514684754.png

 

 

How is the survival probability for Alfred of 0.914562597 at time_to_bad=6 computed from the model outputs?

--
Paige Miller
6 REPLIES 6
Ksharp
Super User
That is the reason why we need statistical software like R,Stat,SAS.
If you want get it done outside of SAS, then you need to know the algorithm of this model very details and very well.
I think that is a big obstacle for most of use.
Otherwise,you don't need SAS anymore, you could build survival model by Java,Python,C,C#........ or other languages.
Season
Barite | Level 11

The Cox model is directly modeling hazards, not probabilities, so calculation of survival probabilities from Cox models is a bit difficult.

More specifically, this calculation may entail specifiying the baseline survival function. I have seen procedures of adopting a Weibull model as the baseline survival function. See section 20.7.4 (or more specifically, page 422) of Clinical Prediction Models: A Practical Approach to Development, Validation, and Updating | Springer....

But admittedly, this process was only mentioned briefly in the material I cited. The author provided no explanation of the rationale of choosing the Weibull model as the baseline survival function. However, this part was in fact a summarization of one of the author's colleague's previous work, which has already been published: Validation, calibration, revision and combination of prognostic survival models - van Houwelingen - ....

By the way, there is a monograph on building survival models (e.g., Cox models) for prediction of issues like survival probabilities. This is in fact the only monograph on this very specific topic I have found: Dynamic Prediction in Clinical Survival Analysis (Chapman & Hall/CRC Monographs on Statistics and Ap....

PaigeMiller
Diamond | Level 26

Thanks, I will read these documents. UPDATE: I guess I have to buy the books first.

 

I am confused by this:

 

The Cox model is directly modeling hazards, not probabilities, so calculation of survival probabilities from Cox models is a bit difficult.

 

Clearly, SAS is computing survival probabilities, which is why I asked the question in the first place.

--
Paige Miller
Season
Barite | Level 11
Take a look at the "METHOD=" option in the BASELINE statement. It clearly says that the Breslow method is used for survivor function estimation by default. Since you did not specify the "METHOD=" option, that is the method for your calculation.
OsoGris
SAS Employee

The formulas for the survival probabilities are given in the Details => Survivor Function Estimators section of the PROC PHREG documentation.  These formulas are not easily replicated outside of PHREG and we rarely, if ever, get asked about them.  The only one that seems tractable is for METHOD=PL (product limit).  This estimate is S(t) = So(t)**exp(XBETA) where So(t) is the baseline survival function (more on this later) and XBETA is the linear predictor XBETA = b1*x1 + b2*x2 + . . . . + bk*xk. The bi are the coefficients and xi are the covariate values for a particular observation.  

 

The baseline survival function So(t) can be had using the BASELINE statement and OUT= data set where the COVARIATES= data set/option has a zero value for all of the covariates.  Then you can manually put together S(t) = So(t)**exp(XBETA).  

 

Note that the BASELINE statement is typically used in conjunction with the PLOTS= option to produce survival curves.  These curves are produced at the settings given in the COVARIATES= data set.   The COVARIATES= data set typically only has a few observations used for plotting particular survival curves.  See Example xx.8 Survival Curves in the PHREG documentation (Examples section) for an illustration.  The BASELINE statement produces estimates at the event times in the data. 

 

Also note that if you are using the model (t1, t2) counting-process syntax then the BASELINE statement enforces METHOD=Breslow regardless of what METHOD= option you specify.  I think a NOTE is written to the log in this case. 

PaigeMiller
Diamond | Level 26

Thanks to both @OsoGris and @Season , I think this will get me started. Although likely my team working on this will decide this makes very complicated formulae for others to program from scratch in C++, and we might want to look for simpler approaches.

--
Paige Miller

sas-innovate-white.png

Our biggest data and AI event of the year.

Don’t miss the livestream kicking off May 7. It’s free. It’s easy. And it’s the best seat in the house.

Join us virtually with our complimentary SAS Innovate Digital Pass. Watch live or on-demand in multiple languages, with translations available to help you get the most out of every session.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 6 replies
  • 1546 views
  • 4 likes
  • 4 in conversation