BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
noetsi
Obsidian | Level 7

I am new to lasso and adaptive lasso. I am trying to limit the number of variables selected and so I ran this code.

 

proc glmselect data=randomdata plots=all;
partition fraction(validate=.3);
class pd1 pd2 pd3 pd4 pd5 pd6 pd7 pd8 pd9 pd10 pd11 pd1 pd12 pd13 pd14 pd15 pd16 pd17
pd18 pd19 pd20 pd21 pd22 pd23 pd24 pd25 pd26 pd27 pd28 pd29 pd30 pd31 ;

 

model dvd = pd1 pd2 pd3 pd4 pd5 pd6 pd7 pd8 pd9 pd10 pd11 pd12 pd13 pd14 pd15 pd16 pd17
pd18 pd19 pd20 pd21 pd22 pd23 pd24 pd25 pd26 pd27 pd28 pd29 pd30 pd31
/ selection=lasso(stop=none choose=validate);
run;

 

All of these variables have two levels including the DV, I use the variables selected to run logistic regression with half of the data held out from this data set. The problem is when I run this I get.

"Selection stopped because all candidate effects for entry are linearly dependent on effects in the model."
I have no idea what this means and no documentation I have seen mentions this. It generates a list of variables to use. I just don't know if this reflect a problem or not.

 

I am also unclear why it renames the variables. This is what is reported.

 

Effects:Intercept pd3_0 pd6_0 pd7_0 pd10_0 pd11_0 pd14_0 pd17_0 pd19_0 pd20_0 pd21_0 pd22_0 pd23_0 pd26_0 pd28_0 pd29_0 pd30_0
I don't know where the _0 comes from unless this is how  GLMSELECT treats dummies.
 
A second question. How do you run adaptive lasso instead of lasso. I know it can be done, I can't figure how how.

 

1 ACCEPTED SOLUTION

Accepted Solutions
sbxkoenk
SAS Super FREQ

Hello,

 

"Selection stopped because all candidate effects for entry are linearly dependent on effects in the model."

Is it a WARNING: or is it an ERROR:?

In any case, the search stopped.

You might get the model from 'that particular moment' indeed, but it's not necessarily the better model you could bump into under better circumstances.

Maybe you are also interested in this video:

LASSO Selection with PROC GLMSELECT
Funda Gunes, 2018
https://video.sas.com/detail/video/3646879895001/lasso-selection-with-proc-glmselect

 

Good luck,

Koen

View solution in original post

10 REPLIES 10
sbxkoenk
SAS Super FREQ

Hello @noetsi ,

 

See here for PROC GLMSELECT with LASSO and Adaptive LASSO:

Penalized Regression Methods for Linear Models in SAS/STAT®
Funda Gunes, SAS Institute Inc.

https://support.sas.com/rnd/app/stat/papers/2015/PenalizedRegression_LinearModels.pdf

 

With regard to the naming of the variables : PROC GLMSELECT is not renaming anything. It just creates a dummy with zero (0) as the reference category and that's why you get varname_0 in the parameter estimates table.

 

With regard to:
"Selection stopped because all candidate effects for entry are linearly dependent on effects in the model.",
, I come back on this in 10 minutes!

Cheers,

Koen

sbxkoenk
SAS Super FREQ

Hello,

 

***** On top of my previous reply (see above!!) :

 

"Selection stopped because all candidate effects for entry are linearly dependent on effects in the model."

To me, that message is quite clear!

The GLMSELECT procedure will not continue the selection= process if adding a variable will cause the other variables in the model to be linear dependent on one another.
Can you check if you have identical dummies or if adding some dummies result in exactly another dummy?

 

Thanks,

Koen

noetsi
Obsidian | Level 7

No the dummies don't overlap (or it would not run in the logistic regression I already ran which it did). Reading various comments associated with this "error" it appears that it will always be generated when you chose cross validation in the code. I found this out after I posted this. I don't really understand what the message is telling you, it explains why it is stopping at a specific step for sure.

 

If anyone knows the code for adaptive lasso please let me know.

sbxkoenk
SAS Super FREQ

Hello,

 

If you have no linear dependencies in your explanatory (input) variables and you do not do cross-validation (page 1 OP shows you do validation but not cross-validation), you should not get this error I think.

Maybe you have linear dependencies when only looking at training or validation data instead of all data (?).

 

For adaptive LASSO, use :
selection=lasso(adaptive choose=sbc stop=none)

... or something similar. The key is specifying the ADAPTIVE method in the brackets after lasso.


Kind regards,

Koen

noetsi
Obsidian | Level 7
I think the statement is just a reminder of what is occurring that always is given when using cross validation although I am not certain. People who mentioned it did not seem concerns. I ran the adaptive lasso command from the paper you mentioned and got a warning that I also do not understand, but may be related to the first one.
"The adaptive weights for the LASSO method are not uniquely determined because the full least squares model is singular." *sigh*. I am checking the variables again, but I use them in a logistic regression without problem so they should not overlap.
noetsi
Obsidian | Level 7
The adaptive lasso ran and had there been true overlap it could not have so I am confused what these warnings mean, if they mean anything.
sbxkoenk
SAS Super FREQ

Hello,

 

"Selection stopped because all candidate effects for entry are linearly dependent on effects in the model."

Is it a WARNING: or is it an ERROR:?

In any case, the search stopped.

You might get the model from 'that particular moment' indeed, but it's not necessarily the better model you could bump into under better circumstances.

Maybe you are also interested in this video:

LASSO Selection with PROC GLMSELECT
Funda Gunes, 2018
https://video.sas.com/detail/video/3646879895001/lasso-selection-with-proc-glmselect

 

Good luck,

Koen

noetsi
Obsidian | Level 7
Following a suggestion made by several I ran the lasso on the entire data set rather than a training data set (I had held half the data out to use in the logistic regression). Even with that I got the same errors. I find this very confusing because I thought LASSO was intended to deal with limited data relative to predictors. And it won't run correctly even on a data set that the logistic regression will run 🙂 I think the issue may be my choice of cross validation to choose alpha although I know of no other way to choose it.
sbxkoenk
SAS Super FREQ

Hello,

 

Your original PROC GLMSELECT code (on page 1) does not mention alpha-choice neither cross-validation.

Can you post your final code (or post the LOG --> even better!)?

When replying ... use the "Insert Code" button ( </> ) on the toolbar and paste your LOG in the pop-up window. That way, formatting and structure of the LOG are preserved and some colors are added.

 

Thanks,

Koen

noetsi
Obsidian | Level 7
The log is really long. This is the code I ran originally.


ODS graphics on;
proc glmselect data=dvddu plots=all;
partition fraction(validate=.3);
class pd1 pd2 pd3 pd4 pd5 pd6 pd7 pd8 pd9 pd10 pd11 pd12 pd13 pd14 pd15 pd16 pd17
pd18 pd19 pd20 pd21 pd22 pd23 pd24 pd25 pd26 pd27 pd28 pd29 pd30 pd31 ;
model dvd = pd1 pd2 pd3 pd4 pd5 pd6 pd7 pd8 pd9 pd10 pd11 pd12 pd13 pd14 pd15 pd16 pd17
pd18 pd19 pd20 pd21 pd22 pd23 pd24 pd25 pd26 pd27 pd28 pd29 pd30 pd31
/ selection=lasso(stop=none choose=validate);
run;

proc glmselect data=dvddu plots=all;
partition fraction(validate=.3);
class pd1 pd2 pd3 pd4 pd5 pd6 pd7 pd8 pd9 pd10 pd11 pd12 pd13 pd14 pd15 pd16 pd17
pd18 pd19 pd20 pd21 pd22 pd23 pd24 pd25 pd26 pd27 pd28 pd29 pd30 pd31 ;
model dvd = pd1 pd2 pd3 pd4 pd5 pd6 pd7 pd8 pd9 pd10 pd11 pd12 pd13 pd14 pd15 pd16 pd17
pd18 pd19 pd20 pd21 pd22 pd23 pd24 pd25 pd26 pd27 pd28 pd29 pd30 pd31
/ selection=lasso(adaptive stop=none choose=validate);
run;


ods graphics off;



/* I got this warning in the log

WARNING: The adaptive weights for the LASSO method are not uniquely determined because the full least squares model is singular. */



And this in the results

[cid:image001.png@01D7AFC6.279FABC0]



So I tried something not using k fold validation in case that was the issue as some suggest with smaller data bases (440 useful cases of which about 40 were at one level of the DV).


ODS graphics on;
proc glmselect data=dvddu plots=all;
/*partition fraction(validate=.3);*/
class pd1 pd2 pd3 pd4 pd5 pd6 pd7 pd8 pd9 pd10 pd11 pd12 pd13 pd14 pd15 pd16 pd17
pd18 pd19 pd20 pd21 pd22 pd23 pd24 pd25 pd26 pd27 pd28 pd29 pd30 pd31 ;
model dvd = pd1 pd2 pd3 pd4 pd5 pd6 pd7 pd8 pd9 pd10 pd11 pd12 pd13 pd14 pd15 pd16 pd17
pd18 pd19 pd20 pd21 pd22 pd23 pd24 pd25 pd26 pd27 pd28 pd29 pd30 pd31
/ selection=lasso(stop=none choose=sbc);
run;

ods graphics off;



I got no warnings in the log. I got the same message in the results. I looked at documentation on this message but can't figure out if it means the lasso selection is invalid due to data limitations or not.



Thanks for your help. I would really like to use this method and am spending a long time trying to learn it and the code. But I can't find anything to explain this message or to suggest what the issue is. I ran logistic regression on the same data base without issue so it does not seem like a data issue.




Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 10 replies
  • 1178 views
  • 0 likes
  • 2 in conversation