BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
PurpleNinja
Obsidian | Level 7

Hello Everyone,

 

I want to demonstrate that

 

a) LASSO regression is superior to stepwise selection for variable selection

 

b) LASSO regression is superior to linear regression for prediction

 

 

I would like to use PROC GLMSELECT in SAS 9.3 to illustrate this.  Would anyone have a data set and some code to do so?  

 

If you have just the data set but no code, that's fine - I would be glad to write it by myself.  

 

If you have both the data and the code, that would be even better!  

 

 

Thanks for your help.

1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User

Yeah. LASSO would be better than STEPWISE .
After using these two method with PROC GLMSELECT ,
Check the following fit statistics:

AIC 2172.72685
AICC 2174.27787
SBC 1736.94624
ASE (Train) 24.18515
ASE (Validate) 25.74617
ASE (Test) 22.57297

I would expect LASSO has smaller value of these statistics than STEPWISE.

View solution in original post

6 REPLIES 6
Ksharp
Super User
Compare AIC BIC PRESS .... these model fit statistic with these two method.

I don't understand your second question. LASSO is variable selection method ,not a regression method.

There are some better method than LASSO , like Net-LASSO .
Check the documentation .


PurpleNinja
Obsidian | Level 7

Hi Ksharp,

 

 

Sorry - I could have phrased that second question better.  Suppose I generate 2 different models:

 

a) one model is obtained from stepwise selection

 

b) one model is obtained from LASSO

 

 

I want to show that the predictive accuracy of Model B is higher than that of Model A.

 

As Wikipedia notes, LASSO enhances the predictive accuracy of a resulting statistical model.  

 

https://en.wikipedia.org/wiki/Lasso_(statistics)

 

 

 

Would you have an example data set that I can use to demonstrate this?

 

 

Thanks.

Ksharp
Super User

Yeah. LASSO would be better than STEPWISE .
After using these two method with PROC GLMSELECT ,
Check the following fit statistics:

AIC 2172.72685
AICC 2174.27787
SBC 1736.94624
ASE (Train) 24.18515
ASE (Validate) 25.74617
ASE (Test) 22.57297

I would expect LASSO has smaller value of these statistics than STEPWISE.

PurpleNinja
Obsidian | Level 7

Hello Ksharp,

 

 

Could you please tell me where you got these statistics?  Did you apply those methods to a data set?  If so, could you please tell me where that data set comes from?

 

 

Thanks.

Ksharp
Super User
These goodness-fit statistics I referred to is from SAS documentation. There are many example you can work with in PROC GLMSELECT documentation.
PurpleNinja
Obsidian | Level 7
Sorry for the very late reply, Ksharp. I forgot about this thread.

Thank you very much for your help!

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 6 replies
  • 2141 views
  • 0 likes
  • 2 in conversation