Calcite | Level 5

## PROC MI - How do I impute data conditioned on other data?

Hi all,

(First post, new to SAS, using SAS University edition)

Big picture:

I am trying to run multivariate, non-linear regression on this data set (i.e. trying to use multiple predictor variables (power, volume) to develop parametric equations to estimate a dependent variable (mass)). However, there are some missing data points (not all the information I want is available online, so some systems list mass and power, others list mass and volume, and others list all three, etc). Currently, I am running a complete case analysis (only using the cases where mass, volume, and power are known). However, this vastly shrinks my available data set, and leads to large confidence intervals for the predicted coefficients in the data set.

Goal: Use multiple imputation to impute missing values, then re-run multivariate, non-linear regression code (currently in MATLAB) to get the coefficients in the parametric equations to have less variance.

Currently using:

proc mi data=Work.IMPORT nimpute=10 seed=54321 mu0=313.5 219.2 1275 10.14 796 minimum=0 out=mi_mvn;
mcmc chain=multiple displayinit initial=em(itprint);
run;

Problem: Some of the imputed values for the dry mass category are larger than their respective known wet mass. Is there a way to condition the imputation to limit it to be less than the wet mass? (Does proc MI just apply random guesses for values based on the observed means and standard deviation? I would hope that it uses the known data points to effectively impute the missing data points but that doesn't seem like what's happening).

Super User

## Re: PROC MI - How do I impute data conditioned on other data?

From the online documentation in the Overview of MI procedure:

Multiple imputation does not attempt to estimate each missing value through simulated values. Instead, it draws a random sample of the missing values from its distribution. This process results in valid statistical inferences that properly reflect the uncertainty due to missing values—for example, confidence intervals with the correct probability coverage.

You may want to pick a larger number of imputations and then apply your filters from the generated data to restrict dry < wet mass.

Discussion stats