I was using PROC MI (multiple imputation) to impute missing values in continuous variables such as weight, height and age. I'm using FCS Regression with my Proc MI. However, I noticed that while most values looked really good and created a nice distribution, a few of the values displayed as negative.
After searching online, I found some posts on R and Python blogs where people talked about using a lower bound to set 0 as the lower bound for variables like age, weight and height, that can't plausibly take negative values. I was planning to add a lower bound, but I couldn't find any option in the PROC MI procedure that allows lower bounds to be set.
Some people using other programming languages also tried tactics like:
Set negative values to 0 using: if age<0 then age=0
Reverse the value using absolute so that -5 becomes 5: age = abs(age)
Personally, I would have thought both might skew the distribution a bit?
While I was searching for answers on the internet, I found this comment in a statistics textbook, which suggests you just leave the implausible values alone:
"Intuitively speaking, it makes sense to round values or incorporate bounds to give plausible values. However, these methods has been shown to decrease efficiency and increase bias by altering the correlation or covariances between variables estimated during the imputation process. Additionally, these changes will often result in an underestimation of the uncertainly around imputed values. Remember imputed values are NOT equivalent to observed values and serve only to help estimate the covariances between variables needed for inference (Johnson and Young 2011)."
Does anyone else have any thoughts about lower bounds, and if you've used these before in such situations, did it work out well? I couldn't find any similar posts on this website, but I'm sure it will be useful for others using imputation methods to read this and learn about how other people handled this. Thanks for your thoughts.
... View more