BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
Levi_M
Fluorite | Level 6

I am conducting a logistic regression on variables that were selected via LASSO (hpgenselect). I have a question about my methodology.

(1) If I want to control for age and gender, do I exclude them from the lasso selection but include them in the logistic regression? Or do I include them in both the lasso and logistic regression?

(2) Is it appropriate to use hpgenselect and follow up with logistic regression 

- Thank you - 

1 ACCEPTED SOLUTION

Accepted Solutions
StatDave
SAS Super FREQ

Yes, you can use PROC HPGENSELECT with METHOD=LASSO in the SELECTION statement. Use the INCLUDE= option to specify the effects you want to keep in the model, for example INCLUDE=(AGE GENDER). If you want the final model to control for age and gender, then those variables need to be forced to stay in the model using the INCLUDE= option.

View solution in original post

5 REPLIES 5
StatDave
SAS Super FREQ

Yes, you can use PROC HPGENSELECT with METHOD=LASSO in the SELECTION statement. Use the INCLUDE= option to specify the effects you want to keep in the model, for example INCLUDE=(AGE GENDER). If you want the final model to control for age and gender, then those variables need to be forced to stay in the model using the INCLUDE= option.

Levi_M
Fluorite | Level 6

thank you so much for your quick and informative response. 

Amyzlot1
Calcite | Level 5

Can you provide SAS code. I'm getting errors when I add the include= statement.

 

ods graphics on / LABELMAX=1900;

proc hpgenselect data=cs.analysis_final;
class Hispanic_includes_all_races IDU HCV;
model cs_case = Hispanic_includes_all_races IDU HCV;
selection method=lasso (stop=none choose=bic) details=all
include=Hispanic_includes_all_races;
/* bicplot / plotfit=yes; */
ods output Coefficients=lassocoef;
run;

 

Thank you!

StatDave
SAS Super FREQ
The INCLUDE= option goes in the MODEL statement, not the SELECTION statement. See the HPGENSELECT documentation.
Amyzlot1
Calcite | Level 5

Thank you! This worked!

ods graphics on / LABELMAX=1900;

proc hpgenselect data=cs.analysis_final ;
class Hispanic_includes_all_races age_dich IDU HCV;
model cs_case (event=last) = Hispanic_includes_all_races age_dich IDU HCV / include= (Hispanic_includes_all_races age_dich) dist=binomial link=log;
selection method=lasso (stop=none choose=bic) details=all;
output out=Out xbeta predicted=Pred;
run;

 

Can you generate r-squared statistics ang plots in proc hpgenselect ? (I've checked the documentation but I'm not finding example code.) Thanks!

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 5 replies
  • 755 views
  • 3 likes
  • 3 in conversation