BookmarkSubscribeRSS Feed
lichee
Quartz | Level 8

Hi all,

I'm new in conducting LASSO selection. After reading through several online resources, I used PROC HPGENSELECT with selection method=LASSO to select covariates. But it somehow generated two sets of output. I fed in 188 covariates. The first set of output kept 155 covariates, while the second set of output that was generated at the end kept only 5 covariates. The first output has lower AIC/BIC. I wondered why the second output was generated and how I should use the two sets of output to interpret the selection results.

 

The code looks like below:

PROC HPGENSELECT DATA= analytic_file;

       class Female(ref='0') var3(ref='0') ... var188(ref='0') ;

       model success(event='1') =age Female var3 ... var188/dist=binary;

       selection method=LASSO (choose=SBC stop=none) details=all;

run;

 

Thank you!

1 REPLY 1
Rick_SAS
SAS Super FREQ

First, note recall that the SBC and BIC criteria are synonym.  You specified CHOOSE=SBC, but the procedure will output this statistic under a column whose header is BIC.

 

The output only contains one selected model. Because you used the DETAILS=ALL option, the output also contains a big table that shows the "Selection Details."  I think this is the "first output" that you ask about. The Selection Details table contains columns for the AIC, AICC, and BIC for all steps in the selection process. In that table, there is an asterisk (*) next to the model that is eventually selected as the best, based on your selection criterion (which is SBC, also known as  BIC).

 

The "second output" is the selected model. It is the model that has an asterisk next to it in the "Selection Details" table. This is the final ("best") model according to your selection method and stopping criterion.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 1 reply
  • 817 views
  • 0 likes
  • 2 in conversation