BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
PeterBuzzacott
Calcite | Level 5

Hello

I am running a logistic regression and using the -2LogL statistic to test if removing variables significantly worsens the model.  Finally I am left with two main effects, A and B, and an interaction A*B

Removing the interaction significantly changes the model so A*B must be retained.

I then tested if removing A, or B, would make a significant difference.

Removing A was significant and B was not.

It has been some years since I took a unit in Biostats but I seem to have the impression that both main effects should be retained in a model which includes their interaction, but reading further I am now not so sure.  I read on another stats help forum that even if I remove B SAS will still consider the full model Logit(p) = A+B+A*B but that it will parametrize the model differently, so my questions is:

Would it be valid to remove B but retain A*B if SAS considers the full model regardless.

B has been found to be significant in other (much larger) studies and A has not been investgated previously.  My hypothesis from the outset was that B is associated with A and that A actually has more of an effect than B per se.  My sample size is 90 whereas studies controlling for A have used n>600 to determine that B is significantly associated with the outcome variable.  My intended conclusion is that the effect size of B is less important when designing an experiment than selecting suitable A values.  I am tempted to state that removing B has less of an effect than removing A, but that they have a significant interaction.

Would that be statistically meaningful?

Many thanks in advance.

Peter

France

1 ACCEPTED SOLUTION

Accepted Solutions
1zmm
Quartz | Level 8

Removing a lower order "main effect" term that does not statistically significantly affect a dependent variable in a model when this term's interaction with another main effect term does statistically significantly affect the dependent variable is usually not recommended.  Doing so assumes that the regression coefficient term of the non-significant main effect term equals zero, which usually one does not have prior evidence of.  The following reference includes arguments about this issue and cites earlier references on this topic:

   Nelder JA.  The selection of terms in response-surface models--How strong is the weak heredity principle?  The American Statistician 1998 Nov;52(4):315-318.

These arguments hold true in the situation where you are trying to select/identify explanatory independent variables.  They may be irrelevant in the situation of prediction, where the selection of explanatory variables is less relevant.  For example, a model containing only the highest order interaction term ("cell means" model) may be perfectly suitable because the interest is in the effect on the dependent variable of the multiplicative combinations of terms rather than in this effect of the individual terms comprising the combinations.

View solution in original post

3 REPLIES 3
1zmm
Quartz | Level 8

Removing a lower order "main effect" term that does not statistically significantly affect a dependent variable in a model when this term's interaction with another main effect term does statistically significantly affect the dependent variable is usually not recommended.  Doing so assumes that the regression coefficient term of the non-significant main effect term equals zero, which usually one does not have prior evidence of.  The following reference includes arguments about this issue and cites earlier references on this topic:

   Nelder JA.  The selection of terms in response-surface models--How strong is the weak heredity principle?  The American Statistician 1998 Nov;52(4):315-318.

These arguments hold true in the situation where you are trying to select/identify explanatory independent variables.  They may be irrelevant in the situation of prediction, where the selection of explanatory variables is less relevant.  For example, a model containing only the highest order interaction term ("cell means" model) may be perfectly suitable because the interest is in the effect on the dependent variable of the multiplicative combinations of terms rather than in this effect of the individual terms comprising the combinations.

plf515
Lapis Lazuli | Level 10

If B has been found to be important in other studies, you should probably include it here regardless of any issues with interactions. You can then compare the effect size you found to the effect size earlier research found (rather than comparing p values). If you find a much smaller effect, then that is quite likely to be important. (This is one of many reasons to not use p-values as a criterion for much of anything).

PeterBuzzacott
Calcite | Level 5

Thank you both for the clear answers and I will find the Nelder paper too.

All the best

Peter

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 10972 views
  • 0 likes
  • 3 in conversation