Re: Predictive Modeling Using Logistic Regression
In cluster-mean imputation (page 3-11 and appendix B-7 of course text), should the variables used to define the clusters be restricted to those with missing values or could include all of them (i.e. with and without missing values)?
Moreover, would it be possible to clarify statement at the bottom of page 3.11 of course text: “A simpler but sometimes useful alternative is to define a priori segments (for example, high, middle, low and unknown income) and then do mean or median imputation within each segment”.
Not sure I understand the benefits of creating the above segments; however, I understand how the example shown on page 3.12 works: is the wording of page 3.11 correct?