11-19-2015 04:35 AM
I have now updated my SAS/STAT to 14.1 which inlcude the LASSO selection in HPGENSELECT.
Have anyone tried to fo a logistic regression with HPGENSELECT?
Is it possible?
However, I have som problems with the syntax performing a logistic regression in HPGENSELECT
Thanks for all advice regarding this.
11-19-2015 04:01 PM
Please post the syntax that is giving you the error. HPGENSELECT supports the DIST=BINARY and DIST=BINOMIAL options for logistic regression. For example, the following statements work:
proc hpgenselect data=sashelp.class; model sex(event="M") = height weight age / dist=binary; selection method=lasso; run;
11-22-2015 07:30 AM
I had to add dist=binary, then it worked!
However, one additional question. If you only wtite selection=lasso, what is the default method for varaible selection?
Is cross-validation included in HPGENSELECT with Lasso?
11-22-2015 12:16 PM
The HPGENSELECT documentation is online and answers all of these questions. Look at the SELECTION statement to see various defaults.
I don't understand your question about "the default method for variable selection." The LASSO method IS a variable-selection method, so the default method is LASSO. If you are talking about the SELECT= option, that option is not valid for LASSO.
Yes, you can use the PARTITION statement in conjunction with LASSO to do cross validation.
11-22-2015 03:19 PM
Sorry, I was not clear in my previous question. Different methods (AIC, BIC, Cross-validation) can be used to select an optimal value of the regularization parameter i LASSO.
I have seen some code examples where selection=LASSO(choose=sbc).
If you don't enter anything after LASSO (ie no choose option), which model does SAS use to estimate the regularization parameter?
Since LASSO is quite new in HPGENSELECT I have not found any code examples how do perform cross-validation in this procedure (this is the first time I perform a LASSO regression).
Could this be a correct syntax:
proc hpgenselect data=sashelp.class;
partition fraction(test=0.25 validate=0.25); model sex(event="M") = height weight age / dist=binary; selection method=lasso; run;
11-23-2015 08:37 AM
If you run the statements that you propose, you will see a note in the log that says "ERROR: The TEST partition is not available for the LASSO method." You can use the VALIDATE= option to compute the AIC, AICC, BIC, and ASE statistics on the validation data.
If you don't enter anything after LASSO (ie no choose option), which model does SAS use to estimate the regularization
Your question is answered in the documentation of the SELECTION statement, which I encourage you to read: "If you specify METHOD=LASSO and you do not specify either the or option, then the model in the last LASSO step is chosen as the selected model."
In my opinion, you should probably choose a CHOOSE= criterion. If you are going to specify a validation set, presumably you will want to use CHOOSE=VALIDATE.
By the way, if you add the DETAILS=ALL option to the SELECTION statement, then the output contains additional information that might help clarify what LASSO is doing at each step.