Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Home
- /
- Analytics
- /
- Stat Procs
- /
- How do I impute missing values for a single variable using PROC MI?

Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Posted 07-08-2019 06:38 AM
(1856 views)

Hi all,

I am working with forest inventory data where some field plots were not visited in the field. I would like to impute the missing values of forest cover within the plots. I have tried using PROC MI (SAS ver. 9.4), but keep getting the message: "ERROR: Fewer than two analysis variables". In the inserted code NFI2KMCL and ssu form an identifier of the plot, "forest" is wether the plot has been identified as forest (can be 1 or 2) and "A_forest" is the measured forest area, that is sometimes missing and needs to be imputed (values can only be 0 to 0.0706 hectar as the circular plots have a radius of 15 m).

Hope that someone can help me out!

Thomas

```
data NFI;
input nfi2kmcl ssu $ forest A_forest;
cards;
2km_6396_588_EUREF89 C 0 0
2km_6396_588_EUREF89 E 0 0
2km_6070_660_EUREF89 E 1 0.070685835
2km_6070_660_EUREF89 G 1 0.018552237
2km_6070_662_EUREF89 A 1 .
2km_6070_662_EUREF89 G 1 .
2km_6070_666_EUREF89 A 1 .
2km_6070_666_EUREF89 G 1 .
2km_6070_672_EUREF89 C 1 0.070685835
2km_6070_672_EUREF89 E 1 0.070685835
2km_6070_672_EUREF89 G 1 0.070685835
2km_6070_688_EUREF89 A 1 0.070685835
2km_6070_688_EUREF89 E 1 0.070685835
2km_6070_688_EUREF89 G 1 0.070685835
2km_6080_524_EUREF89 C 1 0
2km_6080_524_EUREF89 E 1 0.070685835
2km_6080_526_EUREF89 A 1 .
2km_6080_526_EUREF89 G 1 .
2km_6080_528_EUREF89 A 1 .
2km_6080_528_EUREF89 C 1 .
;
proc mi data=NFI seed=501213 nimpute=6 min=0 max=0.070686 out=NFI_out;
mcmc;
var A_forest;
by forest;
run;
```

6 REPLIES 6

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

What if you let it work over two variables:

```
data NFI;
input nfi2kmcl :$20. ssu $ forest A_forest;
cards;
2km_6396_588_EUREF89 C 0 0
2km_6396_588_EUREF89 E 0 0
2km_6070_660_EUREF89 E 1 0.070685835
2km_6070_660_EUREF89 G 1 0.018552237
2km_6070_662_EUREF89 A 1 .
2km_6070_662_EUREF89 G 1 .
2km_6070_666_EUREF89 A 1 .
2km_6070_666_EUREF89 G 1 .
2km_6070_672_EUREF89 C 1 0.070685835
2km_6070_672_EUREF89 E 1 0.070685835
2km_6070_672_EUREF89 G 1 0.070685835
2km_6070_688_EUREF89 A 1 0.070685835
2km_6070_688_EUREF89 E 1 0.070685835
2km_6070_688_EUREF89 G 1 0.070685835
2km_6080_524_EUREF89 C 1 0
2km_6080_524_EUREF89 E 1 0.070685835
2km_6080_526_EUREF89 A 1 .
2km_6080_526_EUREF89 G 1 .
2km_6080_528_EUREF89 A 1 .
2km_6080_528_EUREF89 C 1 .
;
proc mi data=NFI seed=501213 nimpute=6 min=0 max=0.070686 out=NFI_out;
mcmc;
var forest A_forest;
run;
```

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Well, then it works of course, but the intention was to impute only the variable of interest. If I use some other variable just to make it run, that variable will affect the result of the imputation ... at least as far as I understand it.

Thomas

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

MI is meant to impute based on a multivariate distribution and thus needs more than 1 variable.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Are there any other SAS procedures made for the single variable imputation that you coud recommend using instead?

Thanks for the rply

Thomas

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

In the case of one variable, MI is similar to bootstrap resampling. For each imputed sample, you can replace each missing value with a random value from the nonzero values. For example, when forest=1, your data has

1 value of 0

1 value of 0.018552237

8 values of 0.070685835

It's not clear to me what you want to do with the forest=0 data, which doesn't have missing values. Copy it over to each imputed set?

Anyway, for the forest=1 data, you can write a program such as the following to replace missing values with a random observed value:

`/* initial distribution of values */`

proc freq data=NFI;
where forest=1;
tables A_forest / missprint;
run;
/* multiple imputations of the forest=1 data */
data Impute;
call streaminit(54321);
array Value[3] _temporary_ (0.070685835, 0.018552237, 0);
array Prob[3] _temporary_ (0.8, 0.1, 0.1);
set NFI(where=(Forest=1));
ObsNum = _N_;
do _Imputation_ = 1 to 5;
if x = . then do;
i = rand("Table", of Prob[*]);
A_forest = Value[i];
end;
else ;
output;
end;
run;
proc sort data=Impute;
by _Imputation_ ObsNum;
run;
/* final distribution of values accross all imputed sets */
proc freq data=Impute;
tables A_forest / missprint;
run;

**SAS Innovate 2025** is scheduled for May 6-9 in Orlando, FL. Sign up to be **first to learn** about the agenda and registration!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.