I am using PROC GLMSELECT for a multiple linear regression model that has categorical variables, which have more than 2 levels, as explanatory variables. I am examining the relationship between stress scores and sexual health variables. I am pretty new to SAS so need some help determining if I am coding this correctly, and if my interpretation is correct.
Here is my model:
proc GLMSELECT data=baseline;
class gender;
model PSS_score = sexual_orient ever_preg cons_sex_age partners_6M CES_D_dep
/ selection=stepwise select=SL showpvalues SLE=0.05 ;
title "Stepwise Regression SRH for OVERALL, 0.05";
run;
1) Does this model look like its coded alright?
2) When I runs this, how exactly do I interpret the results? For example - my variable "cons_sex_age" (age at first sex) has three levels (never had sex, 15 and under, 16 and over), coded as 0, 1, 2. How do I interpret the relationship between stress score and "cons_sex_age"? Would my reference category be what is coded as 0? AKA the parameter estimate would be the relationship between stress score and the "15 and under", which is coded as 0. Or do I need to create dummy variables for GLMSELECT?
I have found the SAS guidance notes on this quite confusing.
Thanks,
1.  No I do not think your code does what you intend, though it is correct, syntax wise. I think you want the REF and PARAM option on the CLASS statement for starters. Otherwise the default is GLM and you should check the design matrix which is outputted to see how it’s dummy coded. 
2. Once you’ve made the changes above you can pick which level is the reference level. You do not need to dummy code it. 
Most of those are better explained in the LOGISTIC regression procedure so maybe finding some good example of that is an easier starting point? 
@tpakhomova wrote:
I am using PROC GLMSELECT for a multiple linear regression model that has categorical variables, which have more than 2 levels, as explanatory variables. I am examining the relationship between stress scores and sexual health variables. I am pretty new to SAS so need some help determining if I am coding this correctly, and if my interpretation is correct.
Here is my model:
proc GLMSELECT data=baseline;
class gender;
model PSS_score = sexual_orient ever_preg cons_sex_age partners_6M CES_D_dep
/ selection=stepwise select=SL showpvalues SLE=0.05 ;
title "Stepwise Regression SRH for OVERALL, 0.05";
run;
1) Does this model look like its coded alright?
2) When I runs this, how exactly do I interpret the results? For example - my variable "cons_sex_age" (age at first sex) has three levels (never had sex, 15 and under, 16 and over), coded as 0, 1, 2. How do I interpret the relationship between stress score and "cons_sex_age"? Would my reference category be what is coded as 0? AKA the parameter estimate would be the relationship between stress score and the "15 and under", which is coded as 0. Or do I need to create dummy variables for GLMSELECT?
I have found the SAS guidance notes on this quite confusing.
Thanks,
1.  No I do not think your code does what you intend, though it is correct, syntax wise. I think you want the REF and PARAM option on the CLASS statement for starters. Otherwise the default is GLM and you should check the design matrix which is outputted to see how it’s dummy coded. 
2. Once you’ve made the changes above you can pick which level is the reference level. You do not need to dummy code it. 
Most of those are better explained in the LOGISTIC regression procedure so maybe finding some good example of that is an easier starting point? 
@tpakhomova wrote:
I am using PROC GLMSELECT for a multiple linear regression model that has categorical variables, which have more than 2 levels, as explanatory variables. I am examining the relationship between stress scores and sexual health variables. I am pretty new to SAS so need some help determining if I am coding this correctly, and if my interpretation is correct.
Here is my model:
proc GLMSELECT data=baseline;
class gender;
model PSS_score = sexual_orient ever_preg cons_sex_age partners_6M CES_D_dep
/ selection=stepwise select=SL showpvalues SLE=0.05 ;
title "Stepwise Regression SRH for OVERALL, 0.05";
run;
1) Does this model look like its coded alright?
2) When I runs this, how exactly do I interpret the results? For example - my variable "cons_sex_age" (age at first sex) has three levels (never had sex, 15 and under, 16 and over), coded as 0, 1, 2. How do I interpret the relationship between stress score and "cons_sex_age"? Would my reference category be what is coded as 0? AKA the parameter estimate would be the relationship between stress score and the "15 and under", which is coded as 0. Or do I need to create dummy variables for GLMSELECT?
I have found the SAS guidance notes on this quite confusing.
Thanks,
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.
