Imputation modifies the distribution of the input variables. It is something we don't like, but sometimes wee need to live with it. But modifying it in 2 ways (one way for training another way for validation) is even worse I think. As you see, this is not a formal answer, and I admit, I never experimented with the two approaches. If your final goal is to create a predictive model (?), which imputation technique will you use when you do prediction for the unknown cases? The one derived from the training or from validation dataset? By multiple imputation you mean what PROC MI does? To mix multiple imputation and the use of some validation technique might be difficult. Please someone correct me: - MI is typically used when we have a rather small dataset (with missing values) and we have a (theoretical) model that we want to estimate. - Validation dataset is usually used, when we don't know exactly the model, so we will try a series of models, and use validation dataset to select the best. Typically we have more observations (and also columns) in this case. As a first try I would simply concatenate training and validation datasets and run it. (But not using the target variable.) But you willl need to decide on how to impute using multiple imputation when you predict one unknown case. If you use random forests (or a tree), do you need imputation at all?
... View more