BookmarkSubscribeRSS Feed
Linkachu
Obsidian | Level 7

 

I am conducting a survival analysis of clustered data. The goal is to identify prognostic factors for the outcome from a set of several factors of interest. The data are clustered so I am using COVS(AGGREGATE) along with the ID statement to specify the clustering variable in PROC PHREG (the observational unit is an eye and the clustering variable is mouseID -- each mouse has two eyes). To perform a backwards stepwise selection, I am using the SELECTION=STEPWISE option along with START=100 to include all candidate variables in the initial model (100 is much larger than the actual number of variables) and INCLUDE=1 to include one control variable in all models. 

 

It seems to me that PROC PHREG is using the Type 3 p-values with "Model-based Variance Estimate", but I would like the selection process to be based upon the Type 3 p-values with "Sandwich Variance Estimate", to control for the clustering. The DETAILS keyword in the MODEL statement will print the results from each iteration along with the different sets of p-values (if you specify COVM along with COVS(AGGREGATE) in the PHREG statement).

 

My question:  Is there a way I can change what appears to be the default behavior of the SELECTION= option and use the p-values from the Sandwich Variance Estimate? I could do the selection manually without the SELECTION= option by calling PHREG over and over again, but it would be nice if I could utilize the built-in model selection tools.

 

Here is the basic code I am using. Thank you!

 

 

PROC PHREG DATA=myData COVS(AGGREGATE) COVM;
  CLASS controlVar classVar1 classVar2;
  MODEL Days*Outcome(0)=controlVar classVar1 classVar2 numVar1 numVar2 
    / TIES=EXACT RL=WALD INCLUDE=1 START=100 SELECTION=STEPWISE 
      SLE=0.05 SLS=0.05 TYPE3(WALD) DETAILS; 
  ID mouseID;
RUN;

 

 

controlVar - A classification variable to be included in all models

classVar1, classVar2 ... - Classification variables of interest as prognostic factors

numVar1, numvar2 ... - Continuous numeric variables of interest as prognostic factors

Days - Number of days until the outcome or censoring occurs

Outcome - Equal to 1 if the outcome occurs in the eye or 0 if the eye is censored (no outcome before end of observation)

mouseID - Unique identifier of a study participant

 

SAS Version:  SAS/STAT 13.1

 

3 REPLIES 3
Doc_Duke
Rhodochrosite | Level 12

Hmm.  Would it be that hard to do it manually?  Our mouse experiments typically involve a limited number of mice (e.g. 10-20) and that limits the number of candidate variables to consider.  Harrell (in "Regression Modelling Strategies") recommends limiting the number of candidate predictors to the number of mice experiencing failures.

Linkachu
Obsidian | Level 7

No it would not be too onerous. In fact I have already done it. But it struck me as strange that the SELECTION= process essentially ignores the fact that you are using COVS(AGGREGATE) with the ID statement. It was not obvious either, except that I had a P=0.10 factor left in the final model even though my SLE= and SLS= was 0.05. Once I added the DETAILS keyword to the MODEL statement, so I could see both the model-based and sandwich P-values, it became clear what was happenning -- the p-value for the model-based estimate of this factor was 0.02.

 

It would be nice if a future version of SAS added an option to provide a choice of variance estimates to be used in the selection process, or if it would at least specify in the notes which variance estimate is being used in the selection process. Another nice feature would be to allow the use of SELECTION= with the RANDOM statement (frailty models).

 

Thanks for the reply!

Doc_Duke
Rhodochrosite | Level 12

These are good ideas.  Please post them to

https://communities.sas.com/t5/Community-Suggestion-Box/idb-p/community_ideas

so the SAS management will see them and consider them for the future.

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 2432 views
  • 0 likes
  • 2 in conversation