BookmarkSubscribeRSS Feed
Chaupak
Obsidian | Level 7

Hi, I have a question regarding estimating standardized survival curves in PROC PHREG. I want to explore the effect of a treatment (trt = 1 or 0), adjusting for some covariates in the Cox model, and then estimate the survival curves for each treatment, which adjusted (standardized) for covariates in the model. I tried to use BASELINE statement to do this, because I knew that we can get the survival curve for a specific set of covariates from this statement. This is the sample code:

 

proc phreg data=sampledata;
	model time*event(0) = trt age bmi female black;
run;

trt is treatment (0 or 1), age and bmi are continuous variables, and female and black are binary variables. I found that there are two ways to estimate the survival curves from the baseline statment.

 

The first one is to input a set of mean value for covariates in the model to the "covariates=" option:

 

proc phreg data=sampledata;
	model time*event(0) = trt age bmi female black;
	baseline out=pred covariates=bsl_cov survival = _all_ /  rowid=trt;
run;

Data set "bsl_cov" has two observations: trt = 1 with mean values for all other covariates; trt = 0 with mean values for all other covariates. By doing this, I can get a survival curve for each treatment. 

 

 

The second method is to use the "Direct adjusted survival curve", which uses "DIRADJ" option in baseline statement. According to the SAS official document, direct adjusted survival curve is "computed for each value of variable in the input data. The variable does not have to be a variable in the COVARIATES= data set. Each direct adjusted survival curve is the average of the survival curves of all individuals in the COVARIATES= data set with their value of variable set to a specific value". The code will be like:

proc phreg data=sampledata;
class trt; model time*event(0) = trt age bmi female black;
baseline out=pred covariates=sampledata / group=trt diradj; run;

Here, "sampledata" in baseline statement is the input dataset for PHREG which included observations for all patients. 

 

My main question is, if I want to get the survival curves standardized for covariates, which method should I use? What is the essential difference between two methods? If I use the first method, does it make sense to use the mean of binary variable for the covariates dataset?

 

 

3 REPLIES 3
SteveDenham
Jade | Level 19

I don't know which will give what you want, but I think that if you add 'female' and 'black' to your CLASS statement, the results should automatically handle the categorical nature of the variables.  It may be that this will give separate curves for each row in the 2x2 design (female x black), with the curves adjusted to the mean value within each row.  It would probably require fitting the interaction term to get separate curves for each cell.

 

But I am just sort of spitballing on this one.

 

SteveDenham

Chaupak
Obsidian | Level 7

Thanks. I checked the official guide and I need to specify the value of female and black in the covariate dataset. In the stratified analysis, it seems like I do need an interaction term.

 

ehertzmark
Calcite | Level 5

1.  the questioner is correct that the data set used for the covariates= needs to have all the variables in the model.

2.  for a binary variable, whether you use it as a 'class' or as an indicator (0/1 or 1/2) is irrelevant.  just make sure you know what you did.

3.  the problems with specifying a 'covariates=' data set are that

    (a) the distributions of the covariate values could be different across the exposures.

    (b) specifying a single set of covariate values may lead to a misleading result.  you might get a different ordering of the curves if you chose other value.  this is a characteristic of the nonlinearity (specifically log-linear) of the model.  the mean/median of the survival is not the survival of the mean/median. 

 

Catch up on SAS Innovate 2026

Nearly 200 sessions are now available on demand with the SAS Innovate Digital Pass.

Explore Now →
What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 1765 views
  • 0 likes
  • 3 in conversation