04-12-2016 11:31 AM - edited 04-12-2016 11:55 AM
I am conducting a survival analysis of clustered data. The goal is to identify prognostic factors for the outcome from a set of several factors of interest. The data are clustered so I am using COVS(AGGREGATE) along with the ID statement to specify the clustering variable in PROC PHREG (the observational unit is an eye and the clustering variable is mouseID -- each mouse has two eyes). To perform a backwards stepwise selection, I am using the SELECTION=STEPWISE option along with START=100 to include all candidate variables in the initial model (100 is much larger than the actual number of variables) and INCLUDE=1 to include one control variable in all models.
It seems to me that PROC PHREG is using the Type 3 p-values with "Model-based Variance Estimate", but I would like the selection process to be based upon the Type 3 p-values with "Sandwich Variance Estimate", to control for the clustering. The DETAILS keyword in the MODEL statement will print the results from each iteration along with the different sets of p-values (if you specify COVM along with COVS(AGGREGATE) in the PHREG statement).
My question: Is there a way I can change what appears to be the default behavior of the SELECTION= option and use the p-values from the Sandwich Variance Estimate? I could do the selection manually without the SELECTION= option by calling PHREG over and over again, but it would be nice if I could utilize the built-in model selection tools.
Here is the basic code I am using. Thank you!
PROC PHREG DATA=myData COVS(AGGREGATE) COVM; CLASS controlVar classVar1 classVar2; MODEL Days*Outcome(0)=controlVar classVar1 classVar2 numVar1 numVar2 / TIES=EXACT RL=WALD INCLUDE=1 START=100 SELECTION=STEPWISE SLE=0.05 SLS=0.05 TYPE3(WALD) DETAILS; ID mouseID; RUN;
controlVar - A classification variable to be included in all models
classVar1, classVar2 ... - Classification variables of interest as prognostic factors
numVar1, numvar2 ... - Continuous numeric variables of interest as prognostic factors
Days - Number of days until the outcome or censoring occurs
Outcome - Equal to 1 if the outcome occurs in the eye or 0 if the eye is censored (no outcome before end of observation)
mouseID - Unique identifier of a study participant
SAS Version: SAS/STAT 13.1
04-14-2016 08:48 AM
Hmm. Would it be that hard to do it manually? Our mouse experiments typically involve a limited number of mice (e.g. 10-20) and that limits the number of candidate variables to consider. Harrell (in "Regression Modelling Strategies") recommends limiting the number of candidate predictors to the number of mice experiencing failures.
04-15-2016 07:39 AM
No it would not be too onerous. In fact I have already done it. But it struck me as strange that the SELECTION= process essentially ignores the fact that you are using COVS(AGGREGATE) with the ID statement. It was not obvious either, except that I had a P=0.10 factor left in the final model even though my SLE= and SLS= was 0.05. Once I added the DETAILS keyword to the MODEL statement, so I could see both the model-based and sandwich P-values, it became clear what was happenning -- the p-value for the model-based estimate of this factor was 0.02.
It would be nice if a future version of SAS added an option to provide a choice of variance estimates to be used in the selection process, or if it would at least specify in the notes which variance estimate is being used in the selection process. Another nice feature would be to allow the use of SELECTION= with the RANDOM statement (frailty models).
Thanks for the reply!
04-15-2016 09:19 AM
These are good ideas. Please post them to
so the SAS management will see them and consider them for the future.