BookmarkSubscribeRSS Feed
peatjohnston
Calcite | Level 5

Best-subset instead of stepwise question.

Hello, I have classes of individuals grouped together from cluster analysis. I want to use discriminant analysis to determine group membership of new individuals based on a set of predictors. Normally, I use PROC STEPDISC to find a subset of predictors that go into the discriminant analysis, something like:

proc stepdisc     data=training sle=0.05 singular=0.1;

     class group;

     var VAR1--VAR25

run;

However, recent literature indicates stepwise selection is not as good as evaluating all possible subsets of predictors. Is there a procedure, or otherwise, that can do this? I have looked at PHREG REG and LOGISTIC procedures, but they all seem to be based on numerical data rather than classes. Have I missed something? or should I just convert the group  data from text to numerical?

Thanks in advance.

peat

4 REPLIES 4
PGStats
Opal | Level 21

Best variable subset selection isn't available in PROC STEPDISC. If you have only two groups or if you want to explore group differences two groups at a time, you can perform best variable subset selection in PROC LOGISTIC

title "Discriminating groups A and B";

proc logistic data=training(where=(group in ("A", "B")));

class group;

model group(event="B") = VAR1 -- VAR25 / selection=score best=3 stop=5;

run;

PG

PG
peatjohnston
Calcite | Level 5

Hi PG, and thanks for the response. I actually have 4 groups (sometimes more). It looks like I can just use:

proc logistic data=training;

class group;

model group= VAR1 -- VAR25 / selection=score best=3 stop=5;

run;

This is very helpful. However, is there a way to compare the output models for overfitting? e.g. are four preditors really better than three.

Cheers,

peat

Doc_Duke
Rhodochrosite | Level 12

Peat,

Probably the best way to address overfitting is with Bootstrapping.  there is a substantial literature on it.

Doc Muhlbaier

Duke

peatjohnston
Calcite | Level 5

Thanks Duke, I will look into it.

peat

sas-innovate-white.png

Special offer for SAS Communities members

Save $250 on SAS Innovate and get a free advance copy of the new SAS For Dummies book! Use the code "SASforDummies" to register. Don't miss out, May 6-9, in Orlando, Florida.

 

View the full agenda.

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 1724 views
  • 3 likes
  • 3 in conversation