BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
Season
Lapis Lazuli | Level 10

OK, thank you very, very much for your precious information!

I will investigate PROC GLMMOD in details.

StatDave
SAS Super FREQ

Models with interactions can be dealt with in the same way as models with categorical predictors which I mentioned in my Wednesday post. I should have included a link to this note that discusses the procedures that can be used to save the design/model matrix in a data set with the columns of the matrix as variables. You can then use those variables in the MODEL statement of PROC REG to represent the original model even if it contained CLASS variable and/or interactions. Of those procedures, I recommend that you generally use PROC LOGISTIC and use the PARAM=REF option if the model contains any categorical predictors. That will avoid the problem I also mentioned earlier about using k dummy variables to represent a predictor with k levels. For example, if a and b each have levels 1, 2, or 3 then the following code uses the Hessian weights of the original model in PROC REG to get the collinearity diagnostics for the original model.

   /* fit the desired model and get the Hessian weights */
   proc genmod data=mydata;
      class a b;
      model y=a b a*b x / dist=binomial scoring=50 corrb;
      output out=out hesswgt=w;
      run;
   /* get the columns of the design matrix for use in REG */   
   proc logistic data=out outdesign=xmatrix outdesignonly;
      class a b x / param=ref;
      model y=a b x a*b w;
      run;
   proc reg data=xmatrix;
      model y=a1 a2 b1 b2 a1b1 a1b2 a2b1 a2b2 x / collin collinoint;
      run;
Season
Lapis Lazuli | Level 10

Thank you very much for your information and step-by-step guide! I have tried your method on my data. It worked very well in the diagnostics of collinearity.

But there is another question that follows: should we standardized all of the independent variables (including the interaction terms) or only the original independent variables (leaving the interaction terms unstandardized)?

I guess that standardizing both the original independent variables and their interaction terms included in the model is the correct answer, but I am not sure about it, so I come to seek for your help.

Many thanks!

Season
Lapis Lazuli | Level 10

But now I am confronted with another problem. I found out that several interaction terms were statistically (and of course professionally) significant in my logistic regression model. The interaction terms were both statistically significant when I used the unstandardized or the standardized variables as independent variables in the model.

I tried with the Hessian weight building process in PROC GENMOD demonstrated in the note you provided and found out that the Hessian weights generated when I included the interaction terms were different from the Hessian weights generated when all of the interaction terms were not included. However, both results showed that collinearities exists among several independent variables and the intercept, while no collinearity was observed among the independent variables themselves.

In addition, I tried to use PROC REG to generate tolerance, VIF and condition index and found out that PROC REG does not support adding interaction terms, no matter in the format of "a|b" or "a b a*b". Therefore, despite the fact that the Hessian weights were computed with the interaction terms in the model, there was no way to take them into account when it comes to the computation of collinearity indicators (i.e. tolerance, VIF and condition index).

So, what is the guide to diagnostics and management of collinearity in logistic regression when interaction terms are included?

I also wonder if @PaigeMiller could provide some suggestions on this issue.

Thank you for your help!

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 33 replies
  • 4670 views
  • 13 likes
  • 3 in conversation