topic Re: Best-subset instead of stepwise in SAS Procedures

Best-subset instead of stepwise

peatjohnston — Mon, 15 Jul 2013 00:56:33 GMT

Best-subset instead of stepwise question.

Hello, I have classes of individuals grouped together from cluster analysis. I want to use discriminant analysis to determine group membership of new individuals based on a set of predictors. Normally, I use PROC STEPDISC to find a subset of predictors that go into the discriminant analysis, something like:

proc stepdisc data=training sle=0.05 singular=0.1;

class group;

var VAR1--VAR25

run;

However, recent literature indicates stepwise selection is not as good as evaluating all possible subsets of predictors. Is there a procedure, or otherwise, that can do this? I have looked at PHREG REG and LOGISTIC procedures, but they all seem to be based on numerical data rather than classes. Have I missed something? or should I just convert the group data from text to numerical?

Thanks in advance.

peat

Re: Best-subset instead of stepwise

PGStats — Mon, 15 Jul 2013 02:48:36 GMT

Best variable subset selection isn't available in PROC STEPDISC. If you have only two groups or if you want to explore group differences two groups at a time, you can perform best variable subset selection in PROC LOGISTIC

title "Discriminating groups A and B";

proc logistic data=training(where=(group in ("A", "B")));

class group;

model group(event="B") = VAR1 -- VAR25 / selection=score best=3 stop=5;

run;

Re: Best-subset instead of stepwise

peatjohnston — Mon, 15 Jul 2013 11:29:15 GMT

Hi PG, and thanks for the response. I actually have 4 groups (sometimes more). It looks like I can just use:

proc logistic data=training;

class group;

model group= VAR1 -- VAR25 / selection=score best=3 stop=5;

run;

This is very helpful. However, is there a way to compare the output models for overfitting? e.g. are four preditors really better than three.

Cheers,

peat

Re: Best-subset instead of stepwise

Doc_Duke — Mon, 15 Jul 2013 19:07:01 GMT

Peat,

Probably the best way to address overfitting is with Bootstrapping. there is a substantial literature on it.

Doc Muhlbaier

Duke

Re: Best-subset instead of stepwise

peatjohnston — Mon, 15 Jul 2013 20:05:41 GMT

Thanks Duke, I will look into it.

peat