02-27-2016 03:45 PM
I was very eager to try LASSO variable selection for LOGISTIC regression but can't seem to find the answer to standardization question. Variables selected with STEPWISE and LASSO are completely different. I was under impression that LASSO would select a robust subset of STEPWISE choice but that is not what is happening. LASSO does not select any dichotomous class variables which in my dataset are all coded 0/1 and tend to have larger raw beta coefficients. I wonder if it's a matter of scale?
1. Does SAS do that automatically, or do I need to standardize my variables before I run HPGENSELECT?
2. What about polynomials--do they need to be standardized as well?
3. Do predictors need to be centered?
02-29-2016 08:47 AM
PROC HPGENSELECT supports a NOCENTER option, which is documented as
requests that continuous main effects not be centered and scaled internally. (Continuous main effects
are centered and scaled by default to aid in computing maximum likelihood estimates.) Parameter
estimates and related statistics are always reported on the original scale.
From this you can infer that the default is to center and scale the main effects, but you can turn off this feature.
Predictors do not need to be centered since the intercept term accounts for the center of the predictors.