BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
bollibompa
Quartz | Level 8

Hi,

 

I have now updated my SAS/STAT to 14.1 which inlcude the LASSO selection in HPGENSELECT.

 

Have anyone tried to fo a logistic regression with HPGENSELECT?

Is it possible?

However, I have som problems with the syntax performing a logistic regression in HPGENSELECT

 

Thanks for all advice regarding this.

/Thomas

1 ACCEPTED SOLUTION

Accepted Solutions
Rick_SAS
SAS Super FREQ

Please post the syntax that is giving you the error.  HPGENSELECT supports the DIST=BINARY and DIST=BINOMIAL options for logistic regression.  For example, the following statements work:

 

proc hpgenselect data=sashelp.class; 
   model sex(event="M") = height weight age / dist=binary;
   selection method=lasso;
run;

View solution in original post

7 REPLIES 7
Rick_SAS
SAS Super FREQ

Please post the syntax that is giving you the error.  HPGENSELECT supports the DIST=BINARY and DIST=BINOMIAL options for logistic regression.  For example, the following statements work:

 

proc hpgenselect data=sashelp.class; 
   model sex(event="M") = height weight age / dist=binary;
   selection method=lasso;
run;
bollibompa
Quartz | Level 8

Many thanks!

I had to add dist=binary, then it worked!

 

However, one additional question. If you only wtite selection=lasso, what is the default method for varaible selection?

Is cross-validation included in HPGENSELECT with Lasso?

 

Thanks

Thomas

Rick_SAS
SAS Super FREQ

The HPGENSELECT documentation is online and answers all of these questions. Look at the SELECTION statement to see various defaults.

 

I don't understand your question about "the default method for variable selection." The LASSO method IS a variable-selection method, so the default method is LASSO.  If you are talking about the SELECT= option, that option is not valid for LASSO.

 

Yes, you can use the PARTITION statement in conjunction with LASSO to do cross validation.

bollibompa
Quartz | Level 8

Thanks again!

 

Sorry, I was not clear in my previous question. Different methods (AIC, BIC, Cross-validation) can be used to select an optimal value of the regularization parameter i LASSO.

I have seen some code examples where selection=LASSO(choose=sbc).

If you don't enter anything after LASSO (ie no choose option), which model does SAS use to estimate the regularization parameter?

 

Since LASSO is quite new in HPGENSELECT I have not found any code examples how do perform cross-validation in this procedure (this is the first time I perform a LASSO regression).

 

Could this be a correct syntax:

 

proc hpgenselect data=sashelp.class; 
partition fraction(test=0.25 validate=0.25); model sex(event="M") = height weight age / dist=binary; selection method=lasso; run;

 

 

/Thomas

 

 

Rick_SAS
SAS Super FREQ

If you run the statements that you propose, you will see a note in the log that says "ERROR: The TEST partition is not available for the LASSO method."   You can use the VALIDATE= option to compute the AIC, AICC, BIC, and ASE statistics on the validation data.


bollibompa wrote:

If you don't enter anything after LASSO (ie no choose option), which model does SAS use to estimate the regularization


Your question is answered in the documentation of the SELECTION statement, which I encourage you to read: "If you specify METHOD=LASSO and you do not specify either the CHOOSE= or STOP= option, then the model in the last LASSO step is chosen as the selected model."  

 

In my opinion, you should probably choose a CHOOSE= criterion. If you are going to specify a validation set, presumably you will want to use CHOOSE=VALIDATE.

 

By the way, if you add the DETAILS=ALL option to the SELECTION statement, then the output contains additional information that might help clarify what LASSO is doing at each step.

bollibompa
Quartz | Level 8

Thanks again for your support!

/Thomas

Rick_SAS
SAS Super FREQ

You are welcome. If you think the question has been answered, please close the thread.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 7 replies
  • 12987 views
  • 0 likes
  • 2 in conversation