- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Best-subset instead of stepwise question.
Hello, I have classes of individuals grouped together from cluster analysis. I want to use discriminant analysis to determine group membership of new individuals based on a set of predictors. Normally, I use PROC STEPDISC to find a subset of predictors that go into the discriminant analysis, something like:
proc stepdisc data=training sle=0.05 singular=0.1;
class group;
var VAR1--VAR25
run;
However, recent literature indicates stepwise selection is not as good as evaluating all possible subsets of predictors. Is there a procedure, or otherwise, that can do this? I have looked at PHREG REG and LOGISTIC procedures, but they all seem to be based on numerical data rather than classes. Have I missed something? or should I just convert the group data from text to numerical?
Thanks in advance.
peat
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Best variable subset selection isn't available in PROC STEPDISC. If you have only two groups or if you want to explore group differences two groups at a time, you can perform best variable subset selection in PROC LOGISTIC
title "Discriminating groups A and B";
proc logistic data=training(where=(group in ("A", "B")));
class group;
model group(event="B") = VAR1 -- VAR25 / selection=score best=3 stop=5;
run;
PG
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi PG, and thanks for the response. I actually have 4 groups (sometimes more). It looks like I can just use:
proc logistic data=training;
class group;
model group= VAR1 -- VAR25 / selection=score best=3 stop=5;
run;
This is very helpful. However, is there a way to compare the output models for overfitting? e.g. are four preditors really better than three.
Cheers,
peat
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Peat,
Probably the best way to address overfitting is with Bootstrapping. there is a substantial literature on it.
Doc Muhlbaier
Duke
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thanks Duke, I will look into it.
peat