I was very eager to try LASSO variable selection for LOGISTIC regression but can't seem to find the answer to standardization question. Variables selected with STEPWISE and LASSO are completely different. I was under impression that LASSO would select a robust subset of STEPWISE choice but that is not what is happening. LASSO does not select any dichotomous class variables which in my dataset are all coded 0/1 and tend to have larger raw beta coefficients. I wonder if it's a matter of scale?
1. Does SAS do that automatically, or do I need to standardize my variables before I run HPGENSELECT?
2. What about polynomials--do they need to be standardized as well?
3. Do predictors need to be centered?
Thank you.
PROC HPGENSELECT supports a NOCENTER option, which is documented as
NOCENTER
requests that continuous main effects not be centered and scaled internally. (Continuous main effects
are centered and scaled by default to aid in computing maximum likelihood estimates.) Parameter
estimates and related statistics are always reported on the original scale.
From this you can infer that the default is to center and scale the main effects, but you can turn off this feature.
Predictors do not need to be centered since the intercept term accounts for the center of the predictors.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.