BookmarkSubscribeRSS Feed
Aleks1234
Calcite | Level 5

Given a cox regression model, as below, I would like to 'score'/assign model predictions to the dataset whas500 (or any other dataset). But I would like to do so in order to get predicted survival probabilities that is Pr(T>t) for any t. How do I alter PROC PLM in order to do so? or is there another way?

 

the data set can be downloaded from here: https://stats.idre.ucla.edu/wp-content/uploads/2016/02/whas500.sas7bdat

 

libname a 'Dir where the SAS dataset resides';

   proc format library=work;
    value gender
          0 = "Male"
          1 = "Female";
    run;
    
    data whas500;
    set a.whas500;
    format gender gender.;
    run;
    
    
    proc phreg data = whas500;
    class gender;
    model lenfol*fstat(0) = gender age;;
    store work.s;
    run;
    
    
    proc plm restore=work.s;
       score data=whas500 out=testout predicted / ilink;
    run;
8 REPLIES 8
Rick_SAS
SAS Super FREQ

I suggest you use PROC UNIVARIATE on the predicted percentiles. By default, you get a table for the 99th, 95th, 90th, ..., 5th, 1th percentiles. You can also use the CDFPLOT statement to visualize the cumulative distribution, which is a nice way to visualize the results. Lastly, you can use the OUTPUT statement with the PCTLPTS= option to specify any value(s) of t. For example, the following writes an output data set that has estimates for the unit percentiles:

proc univariate data=testout;
var Predicted;
cdfplot Predicted; /* visualize the percentiles */
ods select Quantiles CDFPlot;
output out=Pctls pctlpts=(0 to 100) pctlpre=P_;
run;

/* pctls in wide format */
proc print data=Pctls;
run;

The output data is in the wide format. If you need it in the long format, you can transpose or you can use PROC STDIZE to get the long form directly:

 

/* pctls in long format */
proc stdize data=testout PctlMtd=ORD_STAT outstat=LongPctls
           pctlpts=(0 to 100);
var Predicted;
run;
 
proc print data=LongPctls(obs=10) noobs;
where _type_ =: 'P';
run;
Rick_SAS
SAS Super FREQ

I guess technically I showed how to get Pr(T <= t). You can subtract this result from 1 to get Pr(T > t).

Aleks1234
Calcite | Level 5

Hi Rick,

 

thanks for your answer.

 

However the code below, does not produce probabilities, do you know why? and  how I can obtain those?

proc plm restore=work.s;
score data=whas500 out=testout predicted / ilink;
run;

 

Rick_SAS
SAS Super FREQ

Are there any error messages in the log? When I run your code on SAS 9.4M6, I get an output data set that contains predicted values:

proc contents data=testout short varnum;
run;
Variables in Creation Order
ID AGE GENDER HR SYSBP DIASBP BMI CVD AFB SHO CHF AV3 MIORD MITYPE YEAR LOS DSTAT LENFOL FSTAT Predicted

 

I can also use PROC PRINT:

 

proc print data=testout(obs=5);
var ID Gender Age lenfol Predicted;
run;
Obs ID GENDER AGE LENFOL Predicted
1 1 Male 83 2178 5.54666
2 2 Male 49 2172 3.27453
3 3 Female 70 2190 4.61235
4 4 Male 70 297 4.67790
5 5 Male 70 2131 4.67790

 

I suppose it is possible that you are running an old version of SAS that does not support some options. What does the log display when you submit

 

%put &=SYSVLONG4;

 

Aleks1234
Calcite | Level 5
Hi,
Yes I get the same dataset , however I doubt these are probabilites 5.54? 3.27 or are they expressed in % ?

Rick_SAS
SAS Super FREQ

No. If you want the predicted probability of survival for each subject with a given combination of (age, Gender, Lenfol), then I think you should use the OUTPUT statement with the SURVIVAL keyword:

 

proc phreg data = whas500;
   class gender;
   model lenfol*fstat(0) = gender age;
   output out=testout survival=PredSurv;
run;

proc sgplot data=testout;
   scatter x=Age y=PredSurv / group=Gender;
run;

 

Aleks1234
Calcite | Level 5
ok, is this the survival probablities at what time?

if I wish to score "new data" and obtain probability for survival at different times t= t_1,t_2,t_3 how do I choose these times?

Do I need to run proc phreg with data=whas500 all the time to score new data ? can it be done independently of the data=whas500 ? in eg. a proc plm statement? which only depends on the parameter estimates and not the underlying data
Rick_SAS
SAS Super FREQ

It is the predicted survival probability for the specified value of LENFOL.

 

I don't understand what you are trying to accomplish, so please read the SAS doc for the following statements:

The BASELINE statement gives a predicted survival curve for each patient:

proc phreg data = whas500;
   class gender;
   model lenfol*fstat(0) = gender age;
   baseline covariates=whas500 out=outAll survival=PredSurv timepoint=0 to 50 by 5;
run;

proc sgplot data=outAll noautolegend;
   series x=LenFol y=PredSurv / group=ID lineattrs=GraphData1;
run;

 

 

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 845 views
  • 0 likes
  • 2 in conversation