Re: Predictive Modeling Using Logistic Regression
Thresholding for collapsing levels (p.3-17)
When applying thresholding (page 3.17 of course text), instead of grouping all small levels into a single "OTHER", as an alternative approach, would it not make sense to try to aggregate them with the other existing levels, either based on domain knowledge/similarity in meaning (e.g. for residential status, all levels related to "renting" could be grouped together) and/or proportion of response?
My response:
I agree with your comments that rather than dumping rare levels into other group we could use your business knowledge or tools available in SAS EM (Decision tree node, Variable selection mode) and assign rare levels to other correlated levels.