09-28-2015 01:45 PM - edited 09-28-2015 01:46 PM
I am currently fiddling around with using LASSO, adaptive LASSO, and elastic net methods using PROC GLMSELECT in SAS 9.4.
One issue that I am running up against, however, is that there doesn't seem to be an option for GLMSELECT to actually display the selected value of the regularization/constraint parameter used in these techniques. For all three methods, you can explicitly provide a value for the penalization parameter, or it can determine the value automatically. For example, for elastic net, if you don't specify a value for L2 (the ridge regression penalty parameter), SAS searches for the optimal value of L2 over a range according to the specified CHOOSE method.
However, at no point, that I can find, does SAS actually provide you or output the value of this constraint parameter. This is problematic for a number of reasons (for example, the SAS documentation notes that you should specify the value of L2 if you have a good estimate of what the constraint parameter should be, but SAS provides no method for actually allowing you to determine such an estimate). For example, if you want to use the model averaging functionality of GLMSELECT in combination with the elastic net method, you MUST specify a value of L2 (if you don't, SAS returns an error).
Ideally, you would be able to run GLMSELECT once with elastic net to determine an optimal value of L2 to then plug into the model averaging. However, I cannot find anything in the standard output or documentation that makes this possible. So, am I missing something? Is there some way to force SAS to actually provide you with the explicit values of the regularization/constraint parameters that it necessarily estimates as part of these penalized regression methods? If not, what is a reasonable way to go about determining reasonable values of these for those situations in which it is necessary to provide them explicitly?
09-29-2015 08:01 AM
This looks like a work in progress to me, but here are some of the steps I would try:
1. Specify the L2SEARCH= option explicitly.
2. Turn on ODS trace on.
3. Check the log for tables that may have the values in there.
I can't guarantee anything at this point, but that is where I would start.
09-30-2015 09:20 AM - edited 09-30-2015 09:20 AM
That's a great idea! Unfortunately, I haven't found anything particularly fruitful following that advice, but it was a good idea, and there is still the possibility that some obscure option I haven't explored yet will sneakily output it to one of those datasets. Hopefully someone at SAS will see this thread and make it more straightforward for us!
09-30-2015 01:43 PM - edited 09-30-2015 02:02 PM
Thanks, lvm. I did contact SAS technical support, and will update this thread with any solutions.
In the meantime, I have two ideas for how to overcome the problem, but they each have issues of their own:
1) It is possible to use ridge regression in PROC REG. Since the L2= specification in Elastic Net is a ridge regression parameter, it may be possible to tune the ridge regression in PROC REG and then export it over to PROC GLMSELECT. Doing so seems to give reasonable results. However, PROC REG does not have a built in method for optimizing the regression parameter, it simply runs the parameters over a specified range and spits out the values. Further, this is fairly ad-hoc, since there is no guarantee that the optimal parameter in a ridge regression is equivalent to the optimal L2 parameter in the elastic net setting.
2) Using the R glmnet package, once can run an elastic net regression, and cross-validate to get optimal values for the various parameters. I figured this might be a decent way to get initial values to feed into PROC GLMSELECT (because, after all, for a variety of reasons I want to keep the majority of the analysis in SAS). However, the glmnet package uses a different parameterization of the elastic net than PROC GLMSELECT, and trying to "convert" them doesn't result in sensible values.
The R package has one lambda parameter and an alpha parameter that describes the amount of "mixing" (i.e. the weights given to the ridge and LASSO penalties in the elastic net). In theory, lambda*alpha should give you L2 (and lambda*(1-alpha) should give you L1), but the values I get from R in such a fashion and plug into SAS (with alpha=0.5), give me radically different results. Without the ability to check the values of L1 SAS is using (and with GLMSELECT apparently not allowing you to specify both an L1 and L2 value), I don't see how to cross-reference.