BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
SASdevAnneMarie
Rhodochrosite | Level 12

Hello Expert!

I'm wondering how to perform a 10-fold cross-validation to retrieve the parameter for Ridge regression.

Indeed, when I apply this code :

proc glmselect data=train plots=all;
	model Y= X1 X2 X3 /
		selection=elasticnet(choose=cv  l1=0 l2search=grid stop=l1)
		cvmethod=split(10);
run;

The Ridge shrinkage criterion is not displayed in the output.

Thank you for your help.

1 ACCEPTED SOLUTION

Accepted Solutions
sbxkoenk
SAS Super FREQ

I have just done this:

ods graphics on;
proc glmselect data=sashelp.Leutrain 
               plots=coefficients;
   model y = x1-x4129/
   selection=elasticnet(choose=cv l1=0 l2search=grid stop=l1 SHOWL2SEARCH SHOWSTEPL1)
		     cvmethod=split(10);
run;

and I get (among many other output):

sbxkoenk_0-1778689996000.png

I think you want to see that value in the bottom of the table ("Chosen Value of L2").

 

BR,

Koen

View solution in original post

22 REPLIES 22
SASdevAnneMarie
Rhodochrosite | Level 12

Thank you, Ksharp!
It's certain that I cannot perform automatic Ridge cross-validation in SAS, right? Is there an existing macro that performs K-folds Ridge cross-validation?

Ksharp
Super User
" I cannot perform automatic Ridge cross-validation in SAS,"
I think so. But maybe some expert of sas could know something new .

"Is there an existing macro that performs K-folds Ridge cross-validation?"
You could replace PROC LOGISTIC in my code with PROC REG + RIDGE= option and get what you need .
SASdevAnneMarie
Rhodochrosite | Level 12
Thank you, but for Ridge regression it's more complicated: I need to vary lambda and compare PRESS.
Ksharp
Super User

OK. Check the following code was what you are looking for ?

 

data acetyl;
input x1-x4 @@;
x1x2 = x1 * x2;
x1x1 = x1 * x1;
datalines;
1300  7.5 0.012 49   1300  9   0.012  50.2 1300 11 0.0115 50.5
1300 13.5 0.013 48.5 1300 17   0.0135 47.5 1300 23 0.012  44.5
1200  5.3 0.04  28   1200  7.5 0.038  31.5 1200 11 0.032  34.5
1200 13.5 0.026 35   1200 17   0.034  38   1200 23 0.041  38.5
1100  5.3 0.084 15   1100  7.5 0.098  17   1100 11 0.092  20.5
1100 17   0.086 29.5
;
 


/****** K-Fold CV ****/
%macro k_fold_cv(k=);
filename score temp;

ods select none;

proc surveyselect data=acetyl group=&k out=have seed=123;
run;

%do i=1 %to &k ;
data training;
 set have(where=(groupid ne &i)) ;
run;
data test;
 set have(where=(groupid eq &i));
run;

proc reg data=training outest=b ridge=0.02 noprint;
   model x4=x1 x2 x3 x1x2 x1x1;
   code file=score residual ;
quit;

/*Score test data*/
data score;
set test;
%include score;
run;
/*Calculate PRESS*/
proc sql;
create table press as
select uss(R_x4)/count(*) as press from score;
quit;

data score&i;
 merge b(where=(_TYPE_='RIDGE') keep=_TYPE_ _RIDGE_) press;
 retain id &i ;
run;
%end;
data k_fold_cv;
 set score1-score&k;
run;

ods select all;
%mend;

%k_fold_cv(k=3)
SASdevAnneMarie
Rhodochrosite | Level 12
Thank you, but here, the ridge is fixed: ridge = 0.02. We need to vary it and select the best model based on the ridge value.
Ksharp
Super User
Sorry. I can't help you. It seems a tough task.
Maybe @Rick_SAS @StatDave could give you a hint.
Rick_SAS
SAS Super FREQ

I haven't done this myself, but there is an example in the GLMSELECT documentation that appears to use the ELASTICNET method to do what you want. Here are some links that I hope will be useful:

 

Ksharp
Super User
Rick,
OP used the method you are mentioned, check the first post.
OP want " Ridge shrinkage criterion" , but PROC GLMSELECT didn't display it .
SASdevAnneMarie
Rhodochrosite | Level 12
Perhaps in SAS, there is still an option that allows doing that without using a macro?..
SASdevAnneMarie
Rhodochrosite | Level 12
Thank you, Rick, but I cannot find the chosen l2search value in the output : ods output FitStatistics=fit;
ods output CriterionPanel=CriterionPanel;
ods output ModelInfo=ModelInfo;
ods output Dimensions=Dimensions;
ods output SelectionSummary=SelectionSummary;
ods output StopReason=StopReason;
ods output CoefficientPanel=CoefficientPanel;
ods output CriterionPanel=CriterionPanel;
ods output ASEPlot=ASEPlot;
ods output SelectedEffects=SelectedEffects;
ods output ANOVA=ANOVA;
ods output FitStatistics=FitStatistics;
ods output ParameterEstimates=ParameterEstimates;
sbxkoenk
SAS Super FREQ

I think the answer is in this paper:

 

PharmaSUG 2019 - Paper ST-059

Regulation Techniques for Multicollinearity: Lasso, Ridge, and Elastic Nets

Deanna Schreiber-Gregory, Henry M Jackson Foundation

Karlen Bader, Henry M Jackson Foundation
https://pharmasug.org/proceedings/2019/ST/PharmaSUG-2019-ST-059.pdf

(end of page 17 and start of page 18)

 

If you have a good estimate of L2, you can specify the value in the L2= option. If you do not specify a value for L2, then by default PROC GLMSELECT searches for a value between 0 and 1 that is optimal according to the current CHOOSE= criterion. Figure 48.12 illustrates the estimation of the ridge regression parameter L2 (L2). Meanwhile, if you do not specify the CHOOSE= option, then the model at the final step in the selection process is selected for each L2 (L2), and the criterion value shown in the below figure is the one at the final step that corresponds to the specified STOP= option (STOP=SBC by default).

 

BR,

Koen

SASdevAnneMarie
Rhodochrosite | Level 12
Hello Koen,
Thank you for your response, but I cannot find the chosen value L2SEARCH (the best L2) displayed in the output. Furthermore, figure 48.12 is not found in the indicated document.
sbxkoenk
SAS Super FREQ

That 
Figure 48.12 Estimation of the Ridge Regression Parameter 2 (L2) in the Elastic Net Method

is in this document:

https://support.sas.com/documentation/onlinedoc/stat/132/glmselect.pdf

... in "Model Selection Issues" on p. 3713

 

I have just run a PROC GLMSELECT myself, but I cannot find it either.

I think it's best to contact SAS Technical Support for this.

 

SAS Technical Support (TS):
Here is a link for your convenience: https://support.sas.com/en/technical-support.html#contact

 

Once you’ve sorted it out or received a helpful reply from TS, it would be good to add the answer you were looking for to this topic-thread. Where or how can you find the best L2 value that was found by the Elastic Net Method?

 

BR,

Koen

Catch up on SAS Innovate 2026

Nearly 200 sessions are now available on demand with the SAS Innovate Digital Pass.

Explore Now →
What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 22 replies
  • 528 views
  • 11 likes
  • 5 in conversation