09-06-2011 08:32 PM
I used logistic regression to run a model. An explanatory variable called 'location' has 3 levels (1, 2, 3). Level 3 is the reference group. For this analysis, the estimated regression coefficients for level 1 and level 2 are 1.102 and 1.111 respectively. As the values are very close, is it sensible to combine these two levels into a single level to make the model simpler? Or it is better to keep the two levels as what they are separate?
09-06-2011 09:18 PM
Depends on the context of the data.
If for example level 1 is age<30 and level 2 is age between 31 and 60 and level 3 is age >60 then you're simply recoding to age <30 and age>30 which is okay, as long as you don't introduce a bias into your data.
If you're combining things that don't make sense ie level 1 is unknown and level 2 is Grade 1 then there it doesn't make sense.
Basically, check if it makes logistic sense to collapse them from a business or interpretative perspective and check if the distribution is significantly different with the predictor ( a chi square usually works) to make sure you interpret things correctly.
I work in Health Care and we routinely do this.