BookmarkSubscribeRSS Feed
neilxu
Calcite | Level 5

This is my first time to use glmselect with lasso options. However the procedure ends very quickly, always 2 steps. I changed the STOP options but no luck. And the result is really bad, R^2 is below 0.3. Don't understand why it just stops.

I have more than 200 IV and only 1 DV (50 records).


Thanks for you input.

3 REPLIES 3
SteveDenham
Jade | Level 19

Any messages in the log, or in the output?  It may be that your 200 IV are highly correlated, and so only two steps are needed to find an optimal set.  However, it is hard to tell without more information.

Steve Denham

neilxu
Calcite | Level 5

if I specify selection=lasso(stop=ADJRSQ); then SAS stop in 2 steps and show:

Selection stopped at a local maximum of the AdjRSq criterion.

If I specify selection=lasso(stop=SBC);then SAS stop in 2 steps and show:

Selection stopped at a local minimum of the SBC criterion.

I only get 2 variables. The AdjRSq is pretty low in either test unless I specify steps=20. With STEPS option, the AdjRSq increases, However the purpose of using lasso is to avoid overfitting. I look at the variables and I believe STEPS is giving me overfitting result.

Look at the correlation between those variables, don't believe all of them are strongly correlated.

Thanks for your help

SteveDenham
Jade | Level 19

This got me thinking a little bit.  I used the example in the SAS/STAT 13.1 documentation, with changes.  First, I ran:

 

proc glmselect data=sashelp.Leutrain plots=coefficients;

model y = x1-x7129/

selection=LASSO(choose=adjrsq);

run;

This stopped after four steps.  Then I ran:

proc glmselect data=sashelp.Leutrain /*valdata=sashelp.Leutest*/

plots=coefficients;

model y = x1-x7129/

selection=LASSO(choose=adjrsq steps=20);

run;

And this went out the full 20 steps, with the optimal value at step 20.  OK, what happened when I did not include the steps= option? Well, the adjRsq criterion actually went down with the inclusion of the fifth predictor, and thus, the procedure stops, with an adjusted Rsq of 0.6132.  I think this is what is happening with your data.  I can get all sorts of answers from this dataset, based on a combination of options.

My personal preferences might be to minimize PRESS, rather than maximizing adjusted Rsquare or minimizing information criteria, especially if I were trying to build a predictive model.

Steve Denham

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 1790 views
  • 0 likes
  • 2 in conversation