I'm trying to understand proc MI, so I have few, maybe silly questions:
* can proc MI be used for predicting missing values? That is the main idea of that, right? But I mean - can it be used if I need to predict 1 most probable value for observartion?
I'm confused. If I want to get predicted value and I have 5 different values from 5 imputations, which one should I use? Like the one from last imputation or some kind of average from all of them?
* maybe there is some better way how to predict missing values?
Yes, the whole point of PROC MI is to predict missing values. But you don't want to predict just one value for the missing response. You want to generate multiple sets of data with predicted response values for each missing value.
Each imputation set would then be passed through a procedure which generates parameter estimates and a covariance matrix for the parameter estimates. The point to note here is that there will be variation in the parameter estimates because you do not know the "true" values of observations which were recorded as missing. The variation in the parameter estimates provides information about the amount of uncertainty that arises from using the imputed data.
After generating parameter estimates and their covariances for each imputation set, you can post-process the multiple parameter estimates to arrive at a single parameter vector with associated covariance matrix which represents your best estimate of the relationship of the response to the predictors given the limitations of using imputed data. For this last step, you would use the MIANALYZE procedure.
In order to get a better perspective on the entire process, refer to the Getting Started section of the MIANALYZE procedure. The entire process of generating multiple imputation sets, fitting a regression model for each imputation set, and then combining the parameter estimates employing the MIANALYZE procedure is illustrated there. That documentation can be found at: