In the case of one variable, MI is similar to bootstrap resampling. For each imputed sample, you can replace each missing value with a random value from the nonzero values. For example, when forest=1, your data has
1 value of 0
1 value of 0.018552237
8 values of 0.070685835
It's not clear to me what you want to do with the forest=0 data, which doesn't have missing values. Copy it over to each imputed set?
Anyway, for the forest=1 data, you can write a program such as the following to replace missing values with a random observed value:
/* initial distribution of values */ proc freq data=NFI;
where forest=1;
tables A_forest / missprint;
run;
/* multiple imputations of the forest=1 data */
data Impute;
call streaminit(54321);
array Value[3] _temporary_ (0.070685835, 0.018552237, 0);
array Prob[3] _temporary_ (0.8, 0.1, 0.1);
set NFI(where=(Forest=1));
ObsNum = _N_;
do _Imputation_ = 1 to 5;
if x = . then do;
i = rand("Table", of Prob[*]);
A_forest = Value[i];
end;
else ;
output;
end;
run;
proc sort data=Impute;
by _Imputation_ ObsNum;
run;
/* final distribution of values accross all imputed sets */
proc freq data=Impute;
tables A_forest / missprint;
run;
... View more