BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
Levi_M
Fluorite | Level 6

I am conducting a logistic regression on variables that were selected via LASSO (hpgenselect). I have a question about my methodology.

(1) If I want to control for age and gender, do I exclude them from the lasso selection but include them in the logistic regression? Or do I include them in both the lasso and logistic regression?

(2) Is it appropriate to use hpgenselect and follow up with logistic regression 

- Thank you - 

1 ACCEPTED SOLUTION

Accepted Solutions
StatDave
SAS Super FREQ

Yes, you can use PROC HPGENSELECT with METHOD=LASSO in the SELECTION statement. Use the INCLUDE= option to specify the effects you want to keep in the model, for example INCLUDE=(AGE GENDER). If you want the final model to control for age and gender, then those variables need to be forced to stay in the model using the INCLUDE= option.

View solution in original post

5 REPLIES 5
StatDave
SAS Super FREQ

Yes, you can use PROC HPGENSELECT with METHOD=LASSO in the SELECTION statement. Use the INCLUDE= option to specify the effects you want to keep in the model, for example INCLUDE=(AGE GENDER). If you want the final model to control for age and gender, then those variables need to be forced to stay in the model using the INCLUDE= option.

Levi_M
Fluorite | Level 6

thank you so much for your quick and informative response. 

Amyzlot1
Calcite | Level 5

Can you provide SAS code. I'm getting errors when I add the include= statement.

 

ods graphics on / LABELMAX=1900;

proc hpgenselect data=cs.analysis_final;
class Hispanic_includes_all_races IDU HCV;
model cs_case = Hispanic_includes_all_races IDU HCV;
selection method=lasso (stop=none choose=bic) details=all
include=Hispanic_includes_all_races;
/* bicplot / plotfit=yes; */
ods output Coefficients=lassocoef;
run;

 

Thank you!

StatDave
SAS Super FREQ
The INCLUDE= option goes in the MODEL statement, not the SELECTION statement. See the HPGENSELECT documentation.
Amyzlot1
Calcite | Level 5

Thank you! This worked!

ods graphics on / LABELMAX=1900;

proc hpgenselect data=cs.analysis_final ;
class Hispanic_includes_all_races age_dich IDU HCV;
model cs_case (event=last) = Hispanic_includes_all_races age_dich IDU HCV / include= (Hispanic_includes_all_races age_dich) dist=binomial link=log;
selection method=lasso (stop=none choose=bic) details=all;
output out=Out xbeta predicted=Pred;
run;

 

Can you generate r-squared statistics ang plots in proc hpgenselect ? (I've checked the documentation but I'm not finding example code.) Thanks!

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 5 replies
  • 1036 views
  • 3 likes
  • 3 in conversation