03-22-2017 09:14 AM - edited 04-17-2017 01:42 PM
I am at my wits end trying to find SAS implementations of several standard statistical procedures, having come from the world of R and Python. Can someone please help me? I am trying really hard to appreciate SAS.
How can one do Logistic Regression optimized with a ridge regression, in SAS? According to comments here and here this should already be implemented in SAS with PROC HPGENSELECT. But how? I am new to SAS, having come from the world of R. I am a little disoriented and having a generally hard time finding R-analogues in SAS.
Note: The Newton-Raphson with ridging method implemented in PROC HPGENSELECT, which is implemented "as needed", is probably done for computational reasons when computing the maximum likelihood (especially when there is multicollinearity). I am guessing that the ridge parameter there is really tiny, and for proper ridge regression you want to penalize large coefficients and must be able to tune the ridge parameter to your needs.
03-22-2017 09:42 AM
I interpret @SteveDenham's response in the previous thread to mean that PROC HPGENSELECT supports the LASSO method for variable selection. I don't think he meant to imply that the procedure has an option for user-controlled ridge regression (the way that PROC REG does). In other words, I think the sources you linked to claim that HPGENSELECT uses ridge regression internally as part of the LASSO method, not that the ridging method is surfaced to the user.
03-22-2017 10:22 AM - edited 03-22-2017 10:23 AM
The user in the linked thread clearly was asking for an implementation of ridge logistic regression, so your interpretation seems strange to me. I understood @SteveDenham mentioned that this functionality would be bundled in with the Lasso method, since the user was directly asking about *ridge* logistic regression.
I agree that in the comments section of your article that I also linked to, you answered a user's question about how to implement ridge logistic regression by advertising a different product instead (Lasso). It seems you think this is not doable in SAS, without the tedious effort of researching and writing your own MACRO for it.
03-22-2017 10:50 AM - edited 03-22-2017 10:56 AM
I don't want to argue about other people's intentions, so let me rephrase my answer. I don't think HPGENSELECT provides control over the ridging method. It sounds like you want to implement this method in SAS. I suggest you use the SAS/IML matrix language rather than macro. As you know, logistic regression is not a direct method, it requires an iterative method to optimize the LL.
There is an example of parameter estimation for the logistic model in the SAS/IML doc. You can start with that example, but instead of inverting the weighted normal equations (the line XPXI = .INV(...)) you would solve a ridged equation that might look something like (untested):
A = xx`*(w#xx)+ lambda*I(ncol(xx)); /* add ridging */
RHS = xx`*(w#(y-p));
db = solve(A, RHS);
b = b + db;
Here lambda is a fixed ridge parameter. Since you are coming from R, you should have no problem writing IML code, which is similar in spirit. If you are new to SAS/IML, see "Ten tips for learning the SAS/IML language". If you get stuck, the SAS/IML Support Community is available for assistance.
03-23-2017 03:32 PM
There are two types of shrinkage (aka penalization, aka regularization): L1 (or lasso) regularization which adds an absolute value penalty, and L2 regularization (or ridging) that adds a quadratic penalty. A combination of these is the so-called elastic net. L1 regularization (lasso) and the combination (elastic net) are available in PROC HPGENSELECT. L2 regularization (ridging) should be possible in NLMIXED by simply adding the penalty in the log likelihood. For example, these statements add the quadratic penalty on the parameters of a logistic model using a 0.1 shrinkage parameter which could, of course, be adjusted. The data is the remission data in the first example in the LOGISTIC documentation.
proc nlmixed data=remission;
parms b0=0 b1=0 b2=0 b3=0;
model remiss ~ general(ll);
Compare the results to those from the unpenalized (unridged) logistic model:
proc logistic data=remission;
model remiss(event='1') = cell li temp;