Help using Base SAS procedures

Best-subset instead of stepwise

Reply
Occasional Contributor
Posts: 17

Best-subset instead of stepwise

Best-subset instead of stepwise question.

Hello, I have classes of individuals grouped together from cluster analysis. I want to use discriminant analysis to determine group membership of new individuals based on a set of predictors. Normally, I use PROC STEPDISC to find a subset of predictors that go into the discriminant analysis, something like:

proc stepdisc     data=training sle=0.05 singular=0.1;

     class group;

     var VAR1--VAR25

run;

However, recent literature indicates stepwise selection is not as good as evaluating all possible subsets of predictors. Is there a procedure, or otherwise, that can do this? I have looked at PHREG REG and LOGISTIC procedures, but they all seem to be based on numerical data rather than classes. Have I missed something? or should I just convert the group  data from text to numerical?

Thanks in advance.

peat

Respected Advisor
Posts: 4,920

Re: Best-subset instead of stepwise

Posted in reply to peatjohnston

Best variable subset selection isn't available in PROC STEPDISC. If you have only two groups or if you want to explore group differences two groups at a time, you can perform best variable subset selection in PROC LOGISTIC

title "Discriminating groups A and B";

proc logistic data=training(where=(group in ("A", "B")));

class group;

model group(event="B") = VAR1 -- VAR25 / selection=score best=3 stop=5;

run;

PG

PG
Occasional Contributor
Posts: 17

Re: Best-subset instead of stepwise

Hi PG, and thanks for the response. I actually have 4 groups (sometimes more). It looks like I can just use:

proc logistic data=training;

class group;

model group= VAR1 -- VAR25 / selection=score best=3 stop=5;

run;

This is very helpful. However, is there a way to compare the output models for overfitting? e.g. are four preditors really better than three.

Cheers,

peat

Trusted Advisor
Posts: 2,115

Re: Best-subset instead of stepwise

Posted in reply to peatjohnston

Peat,

Probably the best way to address overfitting is with Bootstrapping.  there is a substantial literature on it.

Doc Muhlbaier

Duke

Occasional Contributor
Posts: 17

Re: Best-subset instead of stepwise

Thanks Duke, I will look into it.

peat

Ask a Question
Discussion stats
  • 4 replies
  • 202 views
  • 3 likes
  • 3 in conversation