I am documenting some code written by a former employee. I am not tremendously familiar with proc GLM, but I understand the basics. I am working with a statistician to make sure the code is doing what it should. What would putting class variables in the model statement achieve?
In our dataset, we have 3 independent variables, a1-a3, which are really just a^1, a^2, and a^3. We have three categorical variables, c1, c2, and c3. The procedure written is:
proc glm data = mydata;
class c2 c3;
model b=a1 a2 a3 c2*c3 /noint solution ss3;
output out=myoutput p=predicted_value;
Neither the statistician nor I could figure out what adding c2*c3 to the accomplish, since they are not continuous variables. Any insight would be very helpful!
In PROC GLM (as well as other modelling procedures), the * sign does not mean "multiply" (as I think you thought) but "cross". X2*X3 is cross-effect for X2 and X3, which means groups defined by values of BOTH X2 and X3, and not groups for X2 on one hand, and groups for X3 on another hand.
Imagine you have a study of respiratory capacity (lung volume or something like that) recorded for patients, some of whom are smokers, other non-smokers ; some are men, other women.
When you just test the SEX effect, you will know if lung volume is different for men and women (it usually is). When you just test the SMOKE_HABIT effect, you will know if smoking decreases the lung capacity, regardless of sex.
What you test when considering the SEX*SMOKE_HABIT is if among women and men taken separately, smoking alters the respiratory capacity.