BookmarkSubscribeRSS Feed
RyanSimmons
Pyrite | Level 9

I am currently fiddling around with using LASSO, adaptive LASSO, and elastic net methods using PROC GLMSELECT in SAS 9.4.

 

One issue that I am running up against, however, is that there doesn't seem to be an option for GLMSELECT to actually display the selected value of the regularization/constraint parameter used in these techniques. For all three methods, you can explicitly provide a value for the penalization parameter, or it can determine the value automatically. For example, for elastic net, if you don't specify a value for L2 (the ridge regression penalty parameter), SAS searches for the optimal value of L2 over a range according to the specified CHOOSE method.

 

However, at no point, that I can find, does SAS actually provide you or output the value of this constraint parameter. This is problematic for a number of reasons (for example, the SAS documentation notes that you should specify the value of L2 if you have a good estimate of what the constraint parameter should be, but SAS provides no method for actually allowing you to determine such an estimate). For example, if you want to use the model averaging functionality of GLMSELECT in combination with the elastic net method, you MUST specify a value of L2 (if you don't, SAS returns an error).

 

Ideally, you would be able to run GLMSELECT once with elastic net to determine an optimal value of L2 to then plug into the model averaging. However, I cannot find anything in the standard output or documentation that makes this possible. So, am I missing something? Is there some way to force SAS to actually provide you with the explicit values of the regularization/constraint parameters that it necessarily estimates as part of these penalized regression methods? If not, what is a reasonable way to go about determining reasonable values of these for those situations in which it is necessary to provide them explicitly?

8 REPLIES 8
SteveDenham
Jade | Level 19

This looks like a work in progress to me, but here are some of the steps I would try:

 

1. Specify the L2SEARCH= option explicitly.

2. Turn on ODS trace on.

3.  Check the log for tables that may have the values in there.

 

I can't guarantee anything at this point, but that is where I would start.

 

Steve Denham

RyanSimmons
Pyrite | Level 9

Steve,

 

That's a great idea! Unfortunately, I haven't found anything particularly fruitful following that advice, but it was a good idea, and there is still the possibility that some obscure option I haven't explored yet will sneakily output it to one of those datasets. Hopefully someone at SAS will see this thread and make it more straightforward for us!

 

Thanks,

 

Ryan

lvm
Rhodochrosite | Level 12 lvm
Rhodochrosite | Level 12

You can always contact SAS technical support. They will give you an answer in 24 hours.

RyanSimmons
Pyrite | Level 9

Thanks, lvm. I did contact SAS technical support, and will update this thread with any solutions.

 

In the meantime, I have two ideas for how to overcome the problem, but they each have issues of their own:

 

1) It is possible to use ridge regression in PROC REG. Since the L2= specification in Elastic Net is a ridge regression parameter, it may be possible to tune the ridge regression in PROC REG and then export it over to PROC GLMSELECT. Doing so seems to give reasonable results. However, PROC REG does not have a built in method for optimizing the regression parameter, it simply runs the parameters over a specified range and spits out the values. Further, this is fairly ad-hoc, since there is no guarantee that the optimal parameter in a ridge regression is equivalent to the optimal L2 parameter in the elastic net setting.

 

2) Using the R glmnet package, once can run an elastic net regression, and cross-validate to get optimal values for the various parameters. I figured this might be a decent way to get initial values to feed into PROC GLMSELECT (because, after all, for a variety of reasons I want to keep the majority of the analysis in SAS). However, the glmnet package uses a different parameterization of the elastic net than PROC GLMSELECT, and trying to "convert" them doesn't result in sensible values.

 

The R package has one lambda parameter and an alpha parameter that describes the amount of "mixing" (i.e. the weights given to the ridge and LASSO penalties in the elastic net). In theory, lambda*alpha should give you L2 (and lambda*(1-alpha) should give you L1), but the values I get from R in such a fashion and plug into SAS (with alpha=0.5), give me radically different results. Without the ability to check the values of L1 SAS is using (and with GLMSELECT apparently not allowing you to specify both an L1 and L2 value), I don't see how to cross-reference.

smrose
Fluorite | Level 6

Did you ever find a solution to this problem? 

StatDave
SAS Super FREQ

One alternative is to do the regularization in PROC NLMIXED where you can explicitly specify the penalties. See this note

Ouss_SAS
Calcite | Level 5

Hello,

 

Did you ever find a solution ? I have the same issue.

Thnx

 

SteveDenham
Jade | Level 19

I don't think one was ever developed, so I will try again (after 5 years).  This is based on the use of the ENSCALE option.  That option applies a rescaling of (1 + L2) to the parameters.  If you do the naive elastic net (without rescaling) and get the model parameters out, repeat the elastic net with the ENSCALE option and get those model parameters out, you could match up the included variables and construct a ratio of the beta's.  From that and a little (actually very little) algebra, you could calculate the final L2 value used.  Verification of this could be done with fixed L2 values.

 

Just an idea.  I haven't even begun to try to actually implement this.

 

SteveDenham

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 8 replies
  • 3169 views
  • 2 likes
  • 6 in conversation