BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
hafidwr
Calcite | Level 5

I'm a new user of SAS. My thesis uses Lasso for fit the Multinomial Logistic Regression using Lasso. I used R earlier and I reckon that Lasso uses a more symmetric approach rather that the traditional K-1 logit model.

My response was categorical. I have 4 categories: NoSchool, School1, School2, and School 3. I intend to make NoSchool as the reference category. Thus, I'll get 3 logit models for the outcome.  

I'd analyzed the common MLE methods for my multinomial logistic regression earlier using SPSS and I got my model. I need my Lasso estimation to be exactly presented like the common one, with 3 logits. But, when I use R to show the coefficient, all response's coefficient showed up (including NoSchool). I understand that according to Friedman, Hastie and Tibshirani (2010) that a more symmetric approach is used. But, my models need to be interpreted in the way MLE common multinomial logistics in SPSS did. Is there any possibility that I can do that?

I spoke recently to Prof. Trevor Hastie himself through email, explaining the same situation. He suggest me that to make the coefficient comparable to SAS, I'll have to substract the glmnet coefficients (probably the one that I got from R result) for the class for which coefficients are missing in SAS from the others; then they are comparable.

 

Excuse me but I'm new in SAS, and I really need help.

any help would be highly appreciated. Thank you.  

 

1 ACCEPTED SOLUTION

Accepted Solutions
StatDave
SAS Super FREQ

Yes. See the documentation of the SELECTION statement. You specify CHOOSE=VALIDATE and must have a PARTITION statement to create to create a validation portion of your data. Lasso can also be done for normal response data using PROC GLMSELECT using similar syntax, so you might want to look at the example in this article which shows using validation as the lasso criterion.

View solution in original post

5 REPLIES 5
StatDave
SAS Super FREQ

I'm not quite sure what you are asking, but maybe it will help to show how you can do Lasso selection for a nominal multinomial model in SAS. You can use PROC HPGENSELECT.  The following shows basically how it works. Put any categorical predictors in the CLASS statement as well as the MODEL statement. List continuous predictors in the MODEL statement only. The REF= option enables you to specify the reference level of the response. The results include final model parameters for the 3 logits (in your case). There are various options to control the selection. See details in the HPGENSELECT documentation. 

 

proc hpgenselect data=mydata;
   class a b c;
   model school(ref="NoSchool")=a b c x1 x2 x3 / dist=mult link=glogit;
   selection method=lasso(choose=aicc) details=all;
   run;

 

hafidwr
Calcite | Level 5

I found your help quite helpful because you understood my point. Nevertheless, I got another slight problem.

I got an error during my running. It said that my selection techniques is not available in hpgenselect.

StatDave
SAS Super FREQ

See the "Lasso" item in the list of Frequently Asked-for Statistics (FASTats). As noted there, the Lasso method was first available in SAS 9.4 TS1M3. You might need to upgrade your release. 

hafidwr
Calcite | Level 5

Wow. Thank you very much. You're very helpful.

Maybe one more question. I don't know the equivalent syntax for "lambda.1se" or "lambda.min". I intend on using 10-fold cross validation if possible.

Can I choose cross validaton instead of "AICC" to fit my lasso model? 

StatDave
SAS Super FREQ

Yes. See the documentation of the SELECTION statement. You specify CHOOSE=VALIDATE and must have a PARTITION statement to create to create a validation portion of your data. Lasso can also be done for normal response data using PROC GLMSELECT using similar syntax, so you might want to look at the example in this article which shows using validation as the lasso criterion.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 5 replies
  • 3667 views
  • 0 likes
  • 2 in conversation