Re: How do I impute missing values for a single variable using PROC MI...

ThomasNord · Posted 07-08-2019 06:38 AM

Hi all,

I am working with forest inventory data where some field plots were not visited in the field. I would like to impute the missing values of forest cover within the plots. I have tried using PROC MI (SAS ver. 9.4), but keep getting the message: "ERROR: Fewer than two analysis variables". In the inserted code NFI2KMCL and ssu form an identifier of the plot, "forest" is wether the plot has been identified as forest (can be 1 or 2) and "A_forest" is the measured forest area, that is sometimes missing and needs to be imputed (values can only be 0 to 0.0706 hectar as the circular plots have a radius of 15 m).

Hope that someone can help me out!

Thomas

data NFI;
input nfi2kmcl ssu $  forest  A_forest;
cards;
2km_6396_588_EUREF89   C   0   0
2km_6396_588_EUREF89   E   0   0
2km_6070_660_EUREF89   E   1   0.070685835
2km_6070_660_EUREF89   G   1   0.018552237
2km_6070_662_EUREF89   A   1   .
2km_6070_662_EUREF89   G   1   .
2km_6070_666_EUREF89   A   1   .
2km_6070_666_EUREF89   G   1   .
2km_6070_672_EUREF89   C   1   0.070685835
2km_6070_672_EUREF89   E   1   0.070685835
2km_6070_672_EUREF89   G   1   0.070685835
2km_6070_688_EUREF89   A   1   0.070685835
2km_6070_688_EUREF89   E   1   0.070685835
2km_6070_688_EUREF89   G   1   0.070685835
2km_6080_524_EUREF89   C   1   0
2km_6080_524_EUREF89   E   1   0.070685835
2km_6080_526_EUREF89   A   1   .
2km_6080_526_EUREF89   G   1   .
2km_6080_528_EUREF89   A   1   .
2km_6080_528_EUREF89   C   1   .
;


proc mi data=NFI seed=501213 nimpute=6 min=0 max=0.070686 out=NFI_out;
	mcmc;
	var A_forest;
	by forest;
run;

Kurt_Bremser · Posted 07-08-2019 07:26 AM

What if you let it work over two variables:

data NFI;
input nfi2kmcl :$20. ssu $  forest  A_forest;
cards;
2km_6396_588_EUREF89   C   0   0
2km_6396_588_EUREF89   E   0   0
2km_6070_660_EUREF89   E   1   0.070685835
2km_6070_660_EUREF89   G   1   0.018552237
2km_6070_662_EUREF89   A   1   .
2km_6070_662_EUREF89   G   1   .
2km_6070_666_EUREF89   A   1   .
2km_6070_666_EUREF89   G   1   .
2km_6070_672_EUREF89   C   1   0.070685835
2km_6070_672_EUREF89   E   1   0.070685835
2km_6070_672_EUREF89   G   1   0.070685835
2km_6070_688_EUREF89   A   1   0.070685835
2km_6070_688_EUREF89   E   1   0.070685835
2km_6070_688_EUREF89   G   1   0.070685835
2km_6080_524_EUREF89   C   1   0
2km_6080_524_EUREF89   E   1   0.070685835
2km_6080_526_EUREF89   A   1   .
2km_6080_526_EUREF89   G   1   .
2km_6080_528_EUREF89   A   1   .
2km_6080_528_EUREF89   C   1   .
;

proc mi data=NFI seed=501213 nimpute=6 min=0 max=0.070686 out=NFI_out;
mcmc;
var forest A_forest;
run;

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

ThomasNord · Posted 07-08-2019 07:40 AM

Well, then it works of course, but the intention was to impute only the variable of interest. If I use some other variable just to make it run, that variable will affect the result of the imputation ... at least as far as I understand it.

Thomas

Kurt_Bremser · Posted 07-08-2019 07:52 AM

It's my guess that MI uses the second variable as some kind of "help". But I'm no statistician, maybe @Rick_SAS can provide more insight.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

SAS_Rob · Posted 07-08-2019 10:22 AM

MI is meant to impute based on a multivariate distribution and thus needs more than 1 variable.

ThomasNord · Posted 07-08-2019 12:17 PM

Are there any other SAS procedures made for the single variable imputation that you coud recommend using instead?

Thanks for the rply

Thomas

Rick_SAS · Posted 07-08-2019 12:53 PM

In the case of one variable, MI is similar to bootstrap resampling. For each imputed sample, you can replace each missing value with a random value from the nonzero values. For example, when forest=1, your data has

1 value of 0

1 value of 0.018552237

8 values of 0.070685835

It's not clear to me what you want to do with the forest=0 data, which doesn't have missing values. Copy it over to each imputed set?

Anyway, for the forest=1 data, you can write a program such as the following to replace missing values with a random observed value:

/* initial distribution of values */
proc freq data=NFI;
where forest=1;
tables A_forest / missprint;
run;

/* multiple imputations of the forest=1 data */
data Impute;
call streaminit(54321);
array Value[3] _temporary_ (0.070685835, 0.018552237, 0);
array Prob[3] _temporary_ (0.8, 0.1, 0.1);
set NFI(where=(Forest=1));
ObsNum = _N_;
do _Imputation_ = 1 to 5;
   if x = . then do;
      i = rand("Table", of Prob[*]);
      A_forest = Value[i];
   end;
   else ;
   output;
end;
run;

proc sort data=Impute;
   by _Imputation_ ObsNum;
run;

/* final distribution of values accross all imputed sets */
proc freq data=Impute;
   tables A_forest / missprint;
run;

How do I impute missing values for a single variable using PROC MI?

Re: How do I impute missing values for a single variable using PROC MI?

Re: How do I impute missing values for a single variable using PROC MI?

Re: How do I impute missing values for a single variable using PROC MI?

Re: How do I impute missing values for a single variable using PROC MI?

Re: How do I impute missing values for a single variable using PROC MI?

Re: How do I impute missing values for a single variable using PROC MI?

Catch up on SAS Innovate 2026