09-20-2013 12:40 PM
I am running a logistic regression and using the -2LogL statistic to test if removing variables significantly worsens the model. Finally I am left with two main effects, A and B, and an interaction A*B
Removing the interaction significantly changes the model so A*B must be retained.
I then tested if removing A, or B, would make a significant difference.
Removing A was significant and B was not.
It has been some years since I took a unit in Biostats but I seem to have the impression that both main effects should be retained in a model which includes their interaction, but reading further I am now not so sure. I read on another stats help forum that even if I remove B SAS will still consider the full model Logit(p) = A+B+A*B but that it will parametrize the model differently, so my questions is:
Would it be valid to remove B but retain A*B if SAS considers the full model regardless.
B has been found to be significant in other (much larger) studies and A has not been investgated previously. My hypothesis from the outset was that B is associated with A and that A actually has more of an effect than B per se. My sample size is 90 whereas studies controlling for A have used n>600 to determine that B is significantly associated with the outcome variable. My intended conclusion is that the effect size of B is less important when designing an experiment than selecting suitable A values. I am tempted to state that removing B has less of an effect than removing A, but that they have a significant interaction.
Would that be statistically meaningful?
Many thanks in advance.
09-24-2013 08:11 AM
Removing a lower order "main effect" term that does not statistically significantly affect a dependent variable in a model when this term's interaction with another main effect term does statistically significantly affect the dependent variable is usually not recommended. Doing so assumes that the regression coefficient term of the non-significant main effect term equals zero, which usually one does not have prior evidence of. The following reference includes arguments about this issue and cites earlier references on this topic:
Nelder JA. The selection of terms in response-surface models--How strong is the weak heredity principle? The American Statistician 1998 Nov;52(4):315-318.
These arguments hold true in the situation where you are trying to select/identify explanatory independent variables. They may be irrelevant in the situation of prediction, where the selection of explanatory variables is less relevant. For example, a model containing only the highest order interaction term ("cell means" model) may be perfectly suitable because the interest is in the effect on the dependent variable of the multiplicative combinations of terms rather than in this effect of the individual terms comprising the combinations.
09-26-2013 07:23 AM
If B has been found to be important in other studies, you should probably include it here regardless of any issues with interactions. You can then compare the effect size you found to the effect size earlier research found (rather than comparing p values). If you find a much smaller effect, then that is quite likely to be important. (This is one of many reasons to not use p-values as a criterion for much of anything).