01-03-2014 12:22 PM
I have looked into a lot of documentation from sas.com and papers from other authors but I don't seem to find how to use the minimum statement for the FCS statement part of proc MI.
More specifically, I have a data set of continuous and discrete variables. I would like to use the fcs statement of proc mi to replace the missing values. In doing this, I want to use
logistic regression for the classification and the regular regression for the imputation of these continuous variables. Because regression would let to negative imputed values, I want to
impose a minimum of 0. I have no idea where to put the minimum statement in the syntax as mentioned on SAS/STAT(R) 9.3 User's Guide
I have tried several places to put the statement without success.
Could anyone help me out with this?
01-03-2014 02:16 PM
It appears that you would have something like:
proc mi data=yourdata minimum=0;
This is as per the SAS/STAT12.3 (SAS 9.4) documentation.
However, even given all of that, the regression method assumes that the data are Gaussian. This may not be the case for your data. An easy way to avoid values less than zero is to log transform the continuous variables prior to imputation, and then backtransforming everything after imputation.
01-03-2014 02:22 PM
Thank you very much for your answer. I have considered that assumption about the data that have to be Gaussian.
However, then I'm wondering which other multiple imputation method I could use given my non-monotone missing pattern.
Do you have any idea if the assumption is a requirement or if it just lowers the power of subsequent analysis?
01-03-2014 02:31 PM
Well, let's look at the continuous variables--what are they? How are they defined or measured? Suppose the data were blood levels of some metabolite. It would be a natural assumption that they have a lognormal distribution. Thus, if I were imputing using PROC MI, I would transform all of the measurements by taking the log of the value. Missing is still missing, but I could now use the regression method to impute the log of the missing values. Does that make sense?
So I guess it comes down to what those continuous variables are.
Another entirely different approach is to not impute at all, but use maximum likelihood methods for your estimations. Provided the data are at least MAR, these estimates will be asymptotically unbiased.
The cure can be worse than the disease here. Setting a minimum can introduce biases (and slow runtime). Log-transformation can distort the relationship between variables, and there is no guarantee that the logged variables will be any closer to normal than the original variables. Often the best approach is just to hold your nose and impute non-normal variable as though they were normal. It's not optimal -- and hopefully better options will become available through PROC MI -- but often the results are not too bad.
I have written extensively about this issue in this paper: