my variable pool contains both categorical variables and continuous variable. I plan to use /selection=stepwise to do the stepwise variable selection.
For categorical variables, is it enough to specify them in class statement? will the selection process treated different levels of a categorical variable as one variable? Or I need to specify categorical variable like 'groupnames = 'Height' 'Age' in the proc reg variable selection?
Unless you specify the SPLIT option on the CLASS statement in PROC GLMSELECT or PROC HPGENSELECT, the whole variable will either be included or excluded. PROC LOGISTIC does not support the SPLIT option, so again the whole variable is selected or not.
will the selection process treated different levels of a categorical variable as one variable
Yes, it will. If you want to treat them each individually you need to make your own dummy variables and in general it's not statistically recommended anyways.
@zzzyyy wrote:
my variable pool contains both categorical variables and continuous variable. I plan to use /selection=stepwise to do the stepwise variable selection.
For categorical variables, is it enough to specify them in class statement? will the selection process treated different levels of a categorical variable as one variable? Or I need to specify categorical variable like 'groupnames = 'Height' 'Age' in the proc reg variable selection?
I don't want to treat them individually, I want different levels be treated as a whole group. So is proc logistic class statement enough to
restrict the variable selection method so that a group of variables enters or leaves the model together?
Can you clarify? Your title says 'logistic' but you mention "PROC REG."
For linear regression models, look at PROC GLMSELECT.
For logistic regression models, look at PROC HPGENSELECT.
since I used to use proc reg and groupname option to restrict the effect selection method so that a group of variables enters or leaves the model together. Now I'm going to use proc logistic, I'm wondering will class statement reach the same result as groupname during variable selection process.
Unless you specify the SPLIT option on the CLASS statement in PROC GLMSELECT or PROC HPGENSELECT, the whole variable will either be included or excluded. PROC LOGISTIC does not support the SPLIT option, so again the whole variable is selected or not.
got it, thanks!
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.