I don't understand why this is happening, but the lowest prediction error rate I'm getting is by including ALL of the variables.
I think you need to be open to what the data is telling you. This is what the data is telling you.
HOWEVER
There is such a thing as overfitting. When a model is overfit, you have added at least one (possibly more than one) variable that is essentially being fit to the random noise of the data, rather than being fit to the signal in the data. So, if you have overfitting, you ought to remove terms from the model, which will give you WORSE fit statistics, but more "stable" (or to phrase things differently, a model that is less variable). So avoiding overfitting gives you WORSE fit but a better model on other measures.
How do you avoid overfitting in PROC DISCRIM? You can use the CROSSVALIDATE option which will show you the classifications using cross-validation; if those are poor, then you can remove terms from the model until the cross-validation statistics are closer to perfect classification (realizing that perfect classification isn't really possible).
There is an example in the PROC DISCRIM documentation where the cross-validation error rates are much higher than the error rates of the model, and this indicates the model has been overfit.
I have never been a fan of stepwise methods, and I avoid them like the plague. Google "problems with stepwise". What would I use? I would use PLS Discriminant Analysis (PLS-DA) which is PROC PLS with dummy variables for Y to indicate which region the observation is.
... View more