turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- Analytics
- /
- Stat Procs
- /
- How Do I Perform Stepwise Selection in PHREG Based...

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

04-12-2016 11:31 AM - edited 04-12-2016 11:55 AM

I am conducting a survival analysis of clustered data. The goal is to identify prognostic factors for the outcome from a set of several factors of interest. The data are clustered so I am using COVS(AGGREGATE) along with the ID statement to specify the clustering variable in PROC PHREG (the observational unit is an eye and the clustering variable is mouseID -- each mouse has two eyes). To perform a backwards stepwise selection, I am using the SELECTION=STEPWISE option along with START=100 to include all candidate variables in the initial model (100 is much larger than the actual number of variables) and INCLUDE=1 to include one control variable in all models.

**It seems to me that PROC PHREG is using the Type 3 p-values with "Model-based Variance Estimate", but I would like the selection process to be based upon the Type 3 p-values with "Sandwich Variance Estimate", to control for the clustering.** The DETAILS keyword in the MODEL statement will print the results from each iteration along with the different sets of p-values (if you specify COVM along with COVS(AGGREGATE) in the PHREG statement).

** My question: ** Is there a way I can change what appears to be the default behavior of the SELECTION= option and use the p-values from the Sandwich Variance Estimate? I could do the selection manually without the SELECTION= option by calling PHREG over and over again, but it would be nice if I could utilize the built-in model selection tools.

Here is the basic code I am using. Thank you!

```
PROC PHREG DATA=myData COVS(AGGREGATE) COVM;
CLASS controlVar classVar1 classVar2;
MODEL Days*Outcome(0)=controlVar classVar1 classVar2 numVar1 numVar2
/ TIES=EXACT RL=WALD INCLUDE=1 START=100 SELECTION=STEPWISE
SLE=0.05 SLS=0.05 TYPE3(WALD) DETAILS;
ID mouseID;
RUN;
```

controlVar - A classification variable to be included in all models

classVar1, classVar2 ... - Classification variables of interest as prognostic factors

numVar1, numvar2 ... - Continuous numeric variables of interest as prognostic factors

Days - Number of days until the outcome or censoring occurs

Outcome - Equal to 1 if the outcome occurs in the eye or 0 if the eye is censored (no outcome before end of observation)

mouseID - Unique identifier of a study participant

SAS Version: SAS/STAT 13.1

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Linkachu

04-14-2016 08:48 AM

Hmm. Would it be that hard to do it manually? Our mouse experiments typically involve a limited number of mice (e.g. 10-20) and that limits the number of candidate variables to consider. Harrell (in "Regression Modelling Strategies") recommends limiting the number of candidate predictors to the number of mice experiencing failures.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Doc_Duke

04-15-2016 07:39 AM

No it would not be too onerous. In fact I have already done it. But it struck me as strange that the SELECTION= process essentially ignores the fact that you are using COVS(AGGREGATE) with the ID statement. It was not obvious either, except that I had a P=0.10 factor left in the final model even though my SLE= and SLS= was 0.05. Once I added the DETAILS keyword to the MODEL statement, so I could see both the model-based and sandwich P-values, it became clear what was happenning -- the p-value for the model-based estimate of this factor was 0.02.

It would be nice if a future version of SAS added an option to provide a choice of variance estimates to be used in the selection process, or if it would at least specify in the notes which variance estimate is being used in the selection process. Another nice feature would be to allow the use of SELECTION= with the RANDOM statement (frailty models).

Thanks for the reply!

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Linkachu

04-15-2016 09:19 AM

These are good ideas. Please post them to

https://communities.sas.com/t5/Community-Suggestion-Box/idb-p/community_ideas

so the SAS management will see them and consider them for the future.