BookmarkSubscribeRSS Feed
ThomasNord
Calcite | Level 5

Hi all,

 

I am working with forest inventory data where some field plots were not visited in the field. I would like to impute the missing values of forest cover within the plots. I have tried using PROC MI (SAS ver. 9.4), but keep getting the message: "ERROR: Fewer than two analysis variables". In the inserted code NFI2KMCL and ssu form an identifier of the plot, "forest" is wether the plot has been identified as forest (can be 1 or 2) and "A_forest" is the measured forest area, that is sometimes missing and needs to be imputed (values can only be 0 to 0.0706 hectar as the circular plots have a radius of 15 m).

 

Hope that someone can help me out!

 

Thomas

 

data NFI;
input nfi2kmcl ssu $  forest  A_forest;
cards;
2km_6396_588_EUREF89   C   0   0
2km_6396_588_EUREF89   E   0   0
2km_6070_660_EUREF89   E   1   0.070685835
2km_6070_660_EUREF89   G   1   0.018552237
2km_6070_662_EUREF89   A   1   .
2km_6070_662_EUREF89   G   1   .
2km_6070_666_EUREF89   A   1   .
2km_6070_666_EUREF89   G   1   .
2km_6070_672_EUREF89   C   1   0.070685835
2km_6070_672_EUREF89   E   1   0.070685835
2km_6070_672_EUREF89   G   1   0.070685835
2km_6070_688_EUREF89   A   1   0.070685835
2km_6070_688_EUREF89   E   1   0.070685835
2km_6070_688_EUREF89   G   1   0.070685835
2km_6080_524_EUREF89   C   1   0
2km_6080_524_EUREF89   E   1   0.070685835
2km_6080_526_EUREF89   A   1   .
2km_6080_526_EUREF89   G   1   .
2km_6080_528_EUREF89   A   1   .
2km_6080_528_EUREF89   C   1   .
;


proc mi data=NFI seed=501213 nimpute=6 min=0 max=0.070686 out=NFI_out;
	mcmc;
	var A_forest;
	by forest;
run;

 

6 REPLIES 6
Kurt_Bremser
Super User

What if you let it work over two variables:

data NFI;
input nfi2kmcl :$20. ssu $  forest  A_forest;
cards;
2km_6396_588_EUREF89   C   0   0
2km_6396_588_EUREF89   E   0   0
2km_6070_660_EUREF89   E   1   0.070685835
2km_6070_660_EUREF89   G   1   0.018552237
2km_6070_662_EUREF89   A   1   .
2km_6070_662_EUREF89   G   1   .
2km_6070_666_EUREF89   A   1   .
2km_6070_666_EUREF89   G   1   .
2km_6070_672_EUREF89   C   1   0.070685835
2km_6070_672_EUREF89   E   1   0.070685835
2km_6070_672_EUREF89   G   1   0.070685835
2km_6070_688_EUREF89   A   1   0.070685835
2km_6070_688_EUREF89   E   1   0.070685835
2km_6070_688_EUREF89   G   1   0.070685835
2km_6080_524_EUREF89   C   1   0
2km_6080_524_EUREF89   E   1   0.070685835
2km_6080_526_EUREF89   A   1   .
2km_6080_526_EUREF89   G   1   .
2km_6080_528_EUREF89   A   1   .
2km_6080_528_EUREF89   C   1   .
;

proc mi data=NFI seed=501213 nimpute=6 min=0 max=0.070686 out=NFI_out;
mcmc;
var forest A_forest;
run;
ThomasNord
Calcite | Level 5

Well, then it works of course, but the intention was to impute only the variable of interest. If I use some other variable just to make it run, that variable will affect the result of the imputation ... at least as far as I understand it.

 

Thomas

SAS_Rob
SAS Employee

MI is meant to impute based on a multivariate distribution and thus needs more than 1 variable.

ThomasNord
Calcite | Level 5

Are there any other SAS procedures made for the single variable imputation that you coud recommend using instead?

 

Thanks for the rply

 

Thomas

Rick_SAS
SAS Super FREQ

In the case of one variable, MI is similar to bootstrap resampling. For each imputed sample, you can replace each missing value with a random value from the nonzero values. For example, when forest=1, your data has

1 value of 0

1 value of 0.018552237

8 values of 0.070685835

 

It's not clear to me what you want to do with the forest=0 data, which doesn't have missing values. Copy it over to each imputed set?

Anyway, for the forest=1 data, you can write a program such as the following to replace missing values with a random observed value:

 

/* initial distribution of values */
proc freq data=NFI; where forest=1; tables A_forest / missprint; run; /* multiple imputations of the forest=1 data */ data Impute; call streaminit(54321); array Value[3] _temporary_ (0.070685835, 0.018552237, 0); array Prob[3] _temporary_ (0.8, 0.1, 0.1); set NFI(where=(Forest=1)); ObsNum = _N_; do _Imputation_ = 1 to 5; if x = . then do; i = rand("Table", of Prob[*]); A_forest = Value[i]; end; else ; output; end; run; proc sort data=Impute; by _Imputation_ ObsNum; run; /* final distribution of values accross all imputed sets */ proc freq data=Impute; tables A_forest / missprint; run;

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 6 replies
  • 2670 views
  • 2 likes
  • 4 in conversation