Programming the statistical procedures from SAS

glmselect with lasso options ends only 2 steps

Reply
Contributor
Posts: 33

glmselect with lasso options ends only 2 steps

This is my first time to use glmselect with lasso options. However the procedure ends very quickly, always 2 steps. I changed the STOP options but no luck. And the result is really bad, R^2 is below 0.3. Don't understand why it just stops.

I have more than 200 IV and only 1 DV (50 records).


Thanks for you input.

Respected Advisor
Posts: 2,655

Re: glmselect with lasso options ends only 2 steps

Any messages in the log, or in the output?  It may be that your 200 IV are highly correlated, and so only two steps are needed to find an optimal set.  However, it is hard to tell without more information.

Steve Denham

Contributor
Posts: 33

Re: glmselect with lasso options ends only 2 steps

if I specify selection=lasso(stop=ADJRSQ); then SAS stop in 2 steps and show:

Selection stopped at a local maximum of the AdjRSq criterion.

If I specify selection=lasso(stop=SBC);then SAS stop in 2 steps and show:

Selection stopped at a local minimum of the SBC criterion.

I only get 2 variables. The AdjRSq is pretty low in either test unless I specify steps=20. With STEPS option, the AdjRSq increases, However the purpose of using lasso is to avoid overfitting. I look at the variables and I believe STEPS is giving me overfitting result.

Look at the correlation between those variables, don't believe all of them are strongly correlated.

Thanks for your help

Respected Advisor
Posts: 2,655

Re: glmselect with lasso options ends only 2 steps

This got me thinking a little bit.  I used the example in the SAS/STAT 13.1 documentation, with changes.  First, I ran:

 

proc glmselect data=sashelp.Leutrain plots=coefficients;

model y = x1-x7129/

selection=LASSO(choose=adjrsq);

run;

This stopped after four steps.  Then I ran:

proc glmselect data=sashelp.Leutrain /*valdata=sashelp.Leutest*/

plots=coefficients;

model y = x1-x7129/

selection=LASSO(choose=adjrsq steps=20);

run;

And this went out the full 20 steps, with the optimal value at step 20.  OK, what happened when I did not include the steps= option? Well, the adjRsq criterion actually went down with the inclusion of the fifth predictor, and thus, the procedure stops, with an adjusted Rsq of 0.6132.  I think this is what is happening with your data.  I can get all sorts of answers from this dataset, based on a combination of options.

My personal preferences might be to minimize PRESS, rather than maximizing adjusted Rsquare or minimizing information criteria, especially if I were trying to build a predictive model.

Steve Denham

Ask a Question
Discussion stats
  • 3 replies
  • 404 views
  • 0 likes
  • 2 in conversation