I am new to lasso and adaptive lasso. I am trying to limit the number of variables selected and so I ran this code.
proc glmselect data=randomdata plots=all;
partition fraction(validate=.3);
class pd1 pd2 pd3 pd4 pd5 pd6 pd7 pd8 pd9 pd10 pd11 pd1 pd12 pd13 pd14 pd15 pd16 pd17
pd18 pd19 pd20 pd21 pd22 pd23 pd24 pd25 pd26 pd27 pd28 pd29 pd30 pd31 ;
model dvd = pd1 pd2 pd3 pd4 pd5 pd6 pd7 pd8 pd9 pd10 pd11 pd12 pd13 pd14 pd15 pd16 pd17
pd18 pd19 pd20 pd21 pd22 pd23 pd24 pd25 pd26 pd27 pd28 pd29 pd30 pd31
/ selection=lasso(stop=none choose=validate);
run;
All of these variables have two levels including the DV, I use the variables selected to run logistic regression with half of the data held out from this data set. The problem is when I run this I get.
I am also unclear why it renames the variables. This is what is reported.
Hello,
"Selection stopped because all candidate effects for entry are linearly dependent on effects in the model."
Is it a WARNING: or is it an ERROR:?
In any case, the search stopped.
You might get the model from 'that particular moment' indeed, but it's not necessarily the better model you could bump into under better circumstances.
Maybe you are also interested in this video:
LASSO Selection with PROC GLMSELECT
Funda Gunes, 2018
https://video.sas.com/detail/video/3646879895001/lasso-selection-with-proc-glmselect
Good luck,
Koen
Hello @noetsi ,
See here for PROC GLMSELECT with LASSO and Adaptive LASSO:
Penalized Regression Methods for Linear Models in SAS/STAT®
Funda Gunes, SAS Institute Inc.
https://support.sas.com/rnd/app/stat/papers/2015/PenalizedRegression_LinearModels.pdf
With regard to the naming of the variables : PROC GLMSELECT is not renaming anything. It just creates a dummy with zero (0) as the reference category and that's why you get varname_0 in the parameter estimates table.
With regard to:
"Selection stopped because all candidate effects for entry are linearly dependent on effects in the model.",
, I come back on this in 10 minutes!
Cheers,
Koen
Hello,
***** On top of my previous reply (see above!!) :
"Selection stopped because all candidate effects for entry are linearly dependent on effects in the model."
To me, that message is quite clear!
The GLMSELECT procedure will not continue the selection= process if adding a variable will cause the other variables in the model to be linear dependent on one another.
Can you check if you have identical dummies or if adding some dummies result in exactly another dummy?
Thanks,
Koen
No the dummies don't overlap (or it would not run in the logistic regression I already ran which it did). Reading various comments associated with this "error" it appears that it will always be generated when you chose cross validation in the code. I found this out after I posted this. I don't really understand what the message is telling you, it explains why it is stopping at a specific step for sure.
If anyone knows the code for adaptive lasso please let me know.
Hello,
If you have no linear dependencies in your explanatory (input) variables and you do not do cross-validation (page 1 OP shows you do validation but not cross-validation), you should not get this error I think.
Maybe you have linear dependencies when only looking at training or validation data instead of all data (?).
For adaptive LASSO, use :
selection=lasso(adaptive choose=sbc stop=none)
... or something similar. The key is specifying the ADAPTIVE method in the brackets after lasso.
Kind regards,
Koen
Hello,
"Selection stopped because all candidate effects for entry are linearly dependent on effects in the model."
Is it a WARNING: or is it an ERROR:?
In any case, the search stopped.
You might get the model from 'that particular moment' indeed, but it's not necessarily the better model you could bump into under better circumstances.
Maybe you are also interested in this video:
LASSO Selection with PROC GLMSELECT
Funda Gunes, 2018
https://video.sas.com/detail/video/3646879895001/lasso-selection-with-proc-glmselect
Good luck,
Koen
Hello,
Your original PROC GLMSELECT code (on page 1) does not mention alpha-choice neither cross-validation.
Can you post your final code (or post the LOG --> even better!)?
When replying ... use the "Insert Code" button ( </> ) on the toolbar and paste your LOG in the pop-up window. That way, formatting and structure of the LOG are preserved and some colors are added.
Thanks,
Koen
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.