Help using Base SAS procedures

Best-subset instead of stepwise

Reply
Occasional Contributor
Posts: 17

Best-subset instead of stepwise

Best-subset instead of stepwise question.

Hello, I have classes of individuals grouped together from cluster analysis. I want to use discriminant analysis to determine group membership of new individuals based on a set of predictors. Normally, I use PROC STEPDISC to find a subset of predictors that go into the discriminant analysis, something like:

proc stepdisc     data=training sle=0.05 singular=0.1;

     class group;

     var VAR1--VAR25

run;

However, recent literature indicates stepwise selection is not as good as evaluating all possible subsets of predictors. Is there a procedure, or otherwise, that can do this? I have looked at PHREG REG and LOGISTIC procedures, but they all seem to be based on numerical data rather than classes. Have I missed something? or should I just convert the group  data from text to numerical?

Thanks in advance.

peat

Respected Advisor
Posts: 4,646

Re: Best-subset instead of stepwise

Best variable subset selection isn't available in PROC STEPDISC. If you have only two groups or if you want to explore group differences two groups at a time, you can perform best variable subset selection in PROC LOGISTIC

title "Discriminating groups A and B";

proc logistic data=training(where=(group in ("A", "B")));

class group;

model group(event="B") = VAR1 -- VAR25 / selection=score best=3 stop=5;

run;

PG

PG
Occasional Contributor
Posts: 17

Re: Best-subset instead of stepwise

Hi PG, and thanks for the response. I actually have 4 groups (sometimes more). It looks like I can just use:

proc logistic data=training;

class group;

model group= VAR1 -- VAR25 / selection=score best=3 stop=5;

run;

This is very helpful. However, is there a way to compare the output models for overfitting? e.g. are four preditors really better than three.

Cheers,

peat

Trusted Advisor
Posts: 2,113

Re: Best-subset instead of stepwise

Peat,

Probably the best way to address overfitting is with Bootstrapping.  there is a substantial literature on it.

Doc Muhlbaier

Duke

Occasional Contributor
Posts: 17

Re: Best-subset instead of stepwise

Thanks Duke, I will look into it.

peat

Ask a Question
Discussion stats
  • 4 replies
  • 177 views
  • 3 likes
  • 3 in conversation