turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- Analytics
- /
- Stat Procs
- /
- glmselect with lasso options ends only 2 steps

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

06-03-2014 06:06 PM

This is my first time to use glmselect with lasso options. However the procedure ends very quickly, always 2 steps. I changed the STOP options but no luck. And the result is really bad, R^2 is below 0.3. Don't understand why it just stops.

I have more than 200 IV and only 1 DV (50 records).

Thanks for you input.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

06-04-2014 08:58 AM

Any messages in the log, or in the output? It may be that your 200 IV are highly correlated, and so only two steps are needed to find an optimal set. However, it is hard to tell without more information.

Steve Denham

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

06-04-2014 10:11 AM

if I specify selection=lasso(stop=ADJRSQ); then SAS stop in 2 steps and show:

Selection stopped at a local maximum of the AdjRSq criterion.

If I specify selection=lasso(stop=SBC);then SAS stop in 2 steps and show:

Selection stopped at a local minimum of the SBC criterion.

I only get 2 variables. The AdjRSq is pretty low in either test unless I specify steps=20. With STEPS option, the AdjRSq increases, However the purpose of using lasso is to avoid overfitting. I look at the variables and I believe STEPS is giving me overfitting result.

Look at the correlation between those variables, don't believe all of them are strongly correlated.

Thanks for your help

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

06-04-2014 01:22 PM

This got me thinking a little bit. I used the example in the SAS/STAT 13.1 documentation, with changes. First, I ran:

proc glmselect data=sashelp.Leutrain plots=coefficients;

model y = x1-x7129/

selection=LASSO(choose=adjrsq);

run;

This stopped after four steps. Then I ran:

proc glmselect data=sashelp.Leutrain /*valdata=sashelp.Leutest*/

plots=coefficients;

model y = x1-x7129/

selection=LASSO(choose=adjrsq steps=20);

run;

And this went out the full 20 steps, with the optimal value at step 20. OK, what happened when I did not include the steps= option? Well, the adjRsq criterion actually went down with the inclusion of the fifth predictor, and thus, the procedure stops, with an adjusted Rsq of 0.6132. I think this is what is happening with your data. I can get all sorts of answers from this dataset, based on a combination of options.

My personal preferences might be to minimize PRESS, rather than maximizing adjusted Rsquare or minimizing information criteria, especially if I were trying to build a predictive model.

Steve Denham