- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I am new to lasso and adaptive lasso. I am trying to limit the number of variables selected and so I ran this code.
proc glmselect data=randomdata plots=all;
partition fraction(validate=.3);
class pd1 pd2 pd3 pd4 pd5 pd6 pd7 pd8 pd9 pd10 pd11 pd1 pd12 pd13 pd14 pd15 pd16 pd17
pd18 pd19 pd20 pd21 pd22 pd23 pd24 pd25 pd26 pd27 pd28 pd29 pd30 pd31 ;
model dvd = pd1 pd2 pd3 pd4 pd5 pd6 pd7 pd8 pd9 pd10 pd11 pd12 pd13 pd14 pd15 pd16 pd17
pd18 pd19 pd20 pd21 pd22 pd23 pd24 pd25 pd26 pd27 pd28 pd29 pd30 pd31
/ selection=lasso(stop=none choose=validate);
run;
All of these variables have two levels including the DV, I use the variables selected to run logistic regression with half of the data held out from this data set. The problem is when I run this I get.
I am also unclear why it renames the variables. This is what is reported.
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
"Selection stopped because all candidate effects for entry are linearly dependent on effects in the model."
Is it a WARNING: or is it an ERROR:?
In any case, the search stopped.
You might get the model from 'that particular moment' indeed, but it's not necessarily the better model you could bump into under better circumstances.
Maybe you are also interested in this video:
LASSO Selection with PROC GLMSELECT
Funda Gunes, 2018
https://video.sas.com/detail/video/3646879895001/lasso-selection-with-proc-glmselect
Good luck,
Koen
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello @noetsi ,
See here for PROC GLMSELECT with LASSO and Adaptive LASSO:
Penalized Regression Methods for Linear Models in SAS/STAT®
Funda Gunes, SAS Institute Inc.
https://support.sas.com/rnd/app/stat/papers/2015/PenalizedRegression_LinearModels.pdf
With regard to the naming of the variables : PROC GLMSELECT is not renaming anything. It just creates a dummy with zero (0) as the reference category and that's why you get varname_0 in the parameter estimates table.
With regard to:
"Selection stopped because all candidate effects for entry are linearly dependent on effects in the model.",
, I come back on this in 10 minutes!
Cheers,
Koen
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
***** On top of my previous reply (see above!!) :
"Selection stopped because all candidate effects for entry are linearly dependent on effects in the model."
To me, that message is quite clear!
The GLMSELECT procedure will not continue the selection= process if adding a variable will cause the other variables in the model to be linear dependent on one another.
Can you check if you have identical dummies or if adding some dummies result in exactly another dummy?
Thanks,
Koen
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
No the dummies don't overlap (or it would not run in the logistic regression I already ran which it did). Reading various comments associated with this "error" it appears that it will always be generated when you chose cross validation in the code. I found this out after I posted this. I don't really understand what the message is telling you, it explains why it is stopping at a specific step for sure.
If anyone knows the code for adaptive lasso please let me know.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
If you have no linear dependencies in your explanatory (input) variables and you do not do cross-validation (page 1 OP shows you do validation but not cross-validation), you should not get this error I think.
Maybe you have linear dependencies when only looking at training or validation data instead of all data (?).
For adaptive LASSO, use :
selection=lasso(adaptive choose=sbc stop=none)
... or something similar. The key is specifying the ADAPTIVE method in the brackets after lasso.
Kind regards,
Koen
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
"The adaptive weights for the LASSO method are not uniquely determined because the full least squares model is singular." *sigh*. I am checking the variables again, but I use them in a logistic regression without problem so they should not overlap.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
"Selection stopped because all candidate effects for entry are linearly dependent on effects in the model."
Is it a WARNING: or is it an ERROR:?
In any case, the search stopped.
You might get the model from 'that particular moment' indeed, but it's not necessarily the better model you could bump into under better circumstances.
Maybe you are also interested in this video:
LASSO Selection with PROC GLMSELECT
Funda Gunes, 2018
https://video.sas.com/detail/video/3646879895001/lasso-selection-with-proc-glmselect
Good luck,
Koen
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
Your original PROC GLMSELECT code (on page 1) does not mention alpha-choice neither cross-validation.
Can you post your final code (or post the LOG --> even better!)?
When replying ... use the "Insert Code" button ( </> ) on the toolbar and paste your LOG in the pop-up window. That way, formatting and structure of the LOG are preserved and some colors are added.
Thanks,
Koen
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
ODS graphics on;
proc glmselect data=dvddu plots=all;
partition fraction(validate=.3);
class pd1 pd2 pd3 pd4 pd5 pd6 pd7 pd8 pd9 pd10 pd11 pd12 pd13 pd14 pd15 pd16 pd17
pd18 pd19 pd20 pd21 pd22 pd23 pd24 pd25 pd26 pd27 pd28 pd29 pd30 pd31 ;
model dvd = pd1 pd2 pd3 pd4 pd5 pd6 pd7 pd8 pd9 pd10 pd11 pd12 pd13 pd14 pd15 pd16 pd17
pd18 pd19 pd20 pd21 pd22 pd23 pd24 pd25 pd26 pd27 pd28 pd29 pd30 pd31
/ selection=lasso(stop=none choose=validate);
run;
proc glmselect data=dvddu plots=all;
partition fraction(validate=.3);
class pd1 pd2 pd3 pd4 pd5 pd6 pd7 pd8 pd9 pd10 pd11 pd12 pd13 pd14 pd15 pd16 pd17
pd18 pd19 pd20 pd21 pd22 pd23 pd24 pd25 pd26 pd27 pd28 pd29 pd30 pd31 ;
model dvd = pd1 pd2 pd3 pd4 pd5 pd6 pd7 pd8 pd9 pd10 pd11 pd12 pd13 pd14 pd15 pd16 pd17
pd18 pd19 pd20 pd21 pd22 pd23 pd24 pd25 pd26 pd27 pd28 pd29 pd30 pd31
/ selection=lasso(adaptive stop=none choose=validate);
run;
ods graphics off;
/* I got this warning in the log
WARNING: The adaptive weights for the LASSO method are not uniquely determined because the full least squares model is singular. */
And this in the results
[cid:image001.png@01D7AFC6.279FABC0]
So I tried something not using k fold validation in case that was the issue as some suggest with smaller data bases (440 useful cases of which about 40 were at one level of the DV).
ODS graphics on;
proc glmselect data=dvddu plots=all;
/*partition fraction(validate=.3);*/
class pd1 pd2 pd3 pd4 pd5 pd6 pd7 pd8 pd9 pd10 pd11 pd12 pd13 pd14 pd15 pd16 pd17
pd18 pd19 pd20 pd21 pd22 pd23 pd24 pd25 pd26 pd27 pd28 pd29 pd30 pd31 ;
model dvd = pd1 pd2 pd3 pd4 pd5 pd6 pd7 pd8 pd9 pd10 pd11 pd12 pd13 pd14 pd15 pd16 pd17
pd18 pd19 pd20 pd21 pd22 pd23 pd24 pd25 pd26 pd27 pd28 pd29 pd30 pd31
/ selection=lasso(stop=none choose=sbc);
run;
ods graphics off;
I got no warnings in the log. I got the same message in the results. I looked at documentation on this message but can't figure out if it means the lasso selection is invalid due to data limitations or not.
Thanks for your help. I would really like to use this method and am spending a long time trying to learn it and the code. But I can't find anything to explain this message or to suggest what the issue is. I ran logistic regression on the same data base without issue so it does not seem like a data issue.