Hello Expert!
I'm wondering how to perform a 10-fold cross-validation to retrieve the parameter for Ridge regression.
Indeed, when I apply this code :
proc glmselect data=train plots=all;
model Y= X1 X2 X3 /
selection=elasticnet(choose=cv l1=0 l2search=grid stop=l1)
cvmethod=split(10);
run;
The Ridge shrinkage criterion is not displayed in the output.
Thank you for your help.
I have just done this:
ods graphics on;
proc glmselect data=sashelp.Leutrain
plots=coefficients;
model y = x1-x4129/
selection=elasticnet(choose=cv l1=0 l2search=grid stop=l1 SHOWL2SEARCH SHOWSTEPL1)
cvmethod=split(10);
run;
and I get (among many other output):
I think you want to see that value in the bottom of the table ("Chosen Value of L2").
BR,
Koen
Thank you, Ksharp!
It's certain that I cannot perform automatic Ridge cross-validation in SAS, right? Is there an existing macro that performs K-folds Ridge cross-validation?
OK. Check the following code was what you are looking for ?
data acetyl;
input x1-x4 @@;
x1x2 = x1 * x2;
x1x1 = x1 * x1;
datalines;
1300 7.5 0.012 49 1300 9 0.012 50.2 1300 11 0.0115 50.5
1300 13.5 0.013 48.5 1300 17 0.0135 47.5 1300 23 0.012 44.5
1200 5.3 0.04 28 1200 7.5 0.038 31.5 1200 11 0.032 34.5
1200 13.5 0.026 35 1200 17 0.034 38 1200 23 0.041 38.5
1100 5.3 0.084 15 1100 7.5 0.098 17 1100 11 0.092 20.5
1100 17 0.086 29.5
;
/****** K-Fold CV ****/
%macro k_fold_cv(k=);
filename score temp;
ods select none;
proc surveyselect data=acetyl group=&k out=have seed=123;
run;
%do i=1 %to &k ;
data training;
set have(where=(groupid ne &i)) ;
run;
data test;
set have(where=(groupid eq &i));
run;
proc reg data=training outest=b ridge=0.02 noprint;
model x4=x1 x2 x3 x1x2 x1x1;
code file=score residual ;
quit;
/*Score test data*/
data score;
set test;
%include score;
run;
/*Calculate PRESS*/
proc sql;
create table press as
select uss(R_x4)/count(*) as press from score;
quit;
data score&i;
merge b(where=(_TYPE_='RIDGE') keep=_TYPE_ _RIDGE_) press;
retain id &i ;
run;
%end;
data k_fold_cv;
set score1-score&k;
run;
ods select all;
%mend;
%k_fold_cv(k=3)
I haven't done this myself, but there is an example in the GLMSELECT documentation that appears to use the ELASTICNET method to do what you want. Here are some links that I hope will be useful:
I think the answer is in this paper:
PharmaSUG 2019 - Paper ST-059
Regulation Techniques for Multicollinearity: Lasso, Ridge, and Elastic Nets
Deanna Schreiber-Gregory, Henry M Jackson Foundation
Karlen Bader, Henry M Jackson Foundation
https://pharmasug.org/proceedings/2019/ST/PharmaSUG-2019-ST-059.pdf
(end of page 17 and start of page 18)
If you have a good estimate of L2, you can specify the value in the L2= option. If you do not specify a value for L2, then by default PROC GLMSELECT searches for a value between 0 and 1 that is optimal according to the current CHOOSE= criterion. Figure 48.12 illustrates the estimation of the ridge regression parameter L2 (L2). Meanwhile, if you do not specify the CHOOSE= option, then the model at the final step in the selection process is selected for each L2 (L2), and the criterion value shown in the below figure is the one at the final step that corresponds to the specified STOP= option (STOP=SBC by default).
BR,
Koen
That
Figure 48.12 Estimation of the Ridge Regression Parameter 2 (L2) in the Elastic Net Method
is in this document:
https://support.sas.com/documentation/onlinedoc/stat/132/glmselect.pdf
... in "Model Selection Issues" on p. 3713
I have just run a PROC GLMSELECT myself, but I cannot find it either.
I think it's best to contact SAS Technical Support for this.
SAS Technical Support (TS):
Here is a link for your convenience: https://support.sas.com/en/technical-support.html#contact
Once you’ve sorted it out or received a helpful reply from TS, it would be good to add the answer you were looking for to this topic-thread. Where or how can you find the best L2 value that was found by the Elastic Net Method?
BR,
Koen
Nearly 200 sessions are now available on demand with the SAS Innovate Digital Pass.
Explore Now →ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.