Best-subset instead of stepwise question.
Hello, I have classes of individuals grouped together from cluster analysis. I want to use discriminant analysis to determine group membership of new individuals based on a set of predictors. Normally, I use PROC STEPDISC to find a subset of predictors that go into the discriminant analysis, something like:
proc stepdisc data=training sle=0.05 singular=0.1;
class group;
var VAR1--VAR25
run;
However, recent literature indicates stepwise selection is not as good as evaluating all possible subsets of predictors. Is there a procedure, or otherwise, that can do this? I have looked at PHREG REG and LOGISTIC procedures, but they all seem to be based on numerical data rather than classes. Have I missed something? or should I just convert the group data from text to numerical?
Thanks in advance.
peat
Best variable subset selection isn't available in PROC STEPDISC. If you have only two groups or if you want to explore group differences two groups at a time, you can perform best variable subset selection in PROC LOGISTIC
title "Discriminating groups A and B";
proc logistic data=training(where=(group in ("A", "B")));
class group;
model group(event="B") = VAR1 -- VAR25 / selection=score best=3 stop=5;
run;
PG
Hi PG, and thanks for the response. I actually have 4 groups (sometimes more). It looks like I can just use:
proc logistic data=training;
class group;
model group= VAR1 -- VAR25 / selection=score best=3 stop=5;
run;
This is very helpful. However, is there a way to compare the output models for overfitting? e.g. are four preditors really better than three.
Cheers,
peat
Peat,
Probably the best way to address overfitting is with Bootstrapping. there is a substantial literature on it.
Doc Muhlbaier
Duke
Thanks Duke, I will look into it.
peat
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.