Hello fellow SAS users and SAS support,
I have been using HPGENSELECT with LASSO selection for a binary dependent variable, and was hoping for clarification regarding the details of the LASSO penalization method and the resulting coefficients. I will post my SAS code at the end. My two questions are:
Thanks very much for any insight you can provide!
PROC HPGENSELECT data=my_data LASSORHO=.80 LASSOSTEPS=20;
WHERE location NOTIN (5,6);
CLASS gender location Physiologic_difficult_AW <many more predictors>
/ param=GLM;
MODEL Number_attempts =
gender location Physiologic_difficult_AW <many more predictors> / DISTRIBUTION=BINARY ;
SELECTION METHOD=LASSO(CHOOSE=AIC) DETAILS=ALL;
RUN;
The answer to the second question is "Yes", but there might be better ways of comparing. Once a model is selected, you could fit using GENMOD and use the LSMEANS statement with the ODDSRATIO option.
SteveDenham
Please check this paper on HPGENSELECT :https://support.sas.com/resources/papers/proceedings15/SAS1742-2015.pdf
The answer to the second question is "Yes", but there might be better ways of comparing. Once a model is selected, you could fit using GENMOD and use the LSMEANS statement with the ODDSRATIO option.
SteveDenham
Agreeing with @SteveDenham
The different parameterizations are the same model. Interpreting the coefficients is the part that trips people up, but LSMEANS eliminates all of that confusion. I wrote a post about this issue (although in a simpler example). https://communities.sas.com/t5/Statistical-Procedures/Interpreting-Multivariate-Linear-Regression-wi...
Thanks very much for your helpful reply. I think LSMEANS is a lovely tool and certainly would be useful here.
Just one tangential comment regarding:
The different parameterizations are the same model.
This is generally true; fit statistics are invariant to parameterization for OLS and ML models. However, with LASSO, the choice of parameterization can affect variable selection and shrinkage estimates! In a way this makes sense. If we choose a reference category that lies in the middle of the others with respect to the outcome, the contrasting coefficients will be small and could get "shrunk away" to zero during optimization. Group LASSO is less vulnerable.
Is your dataset so large that you have to use HPGENSELECT, rather than GLMSELECT? Because if you can use the latter to do the LASSO selection, you have access to the STORE statement, from which you can use PLM to get least squares means and odds ratios.
SteveDenham
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.