BookmarkSubscribeRSS Feed
bollibompa
Quartz | Level 8

Hi,

I am new to multiple imputation and I am trying to impute data in two different variables.

In my dataset I have 10 variables, 8 of them are complete and 2 contains missing data. Of these two, variable1 contains continous data and variable2 categorical (0/1).

About 20% of data for variable1 (continous data) is missing. Values in this variable(variable1) ranges from 1 to 70.

When I am imputing I recieve negative some values in variable1. I am probably doing something wrong in the settings fro PROC MI or can it be negative if all values are positive?

Thanks!

/Thomas

4 REPLIES 4
Rick_SAS
SAS Super FREQ

The MCMC algorithm, which is the default method, assumes multivariate normality, which is why you are getting negative values. Look at the documentation for the TRANSFORM statement and hopefully you can choose a transformation that transforms your data to MVN.

bollibompa
Quartz | Level 8

Thanks!!

This worked!

Can I use the MCMC algorithm for categorical data as well or does MCMC only work for continous data?

/T

Rick_SAS
SAS Super FREQ

A loooong time ago Paul allison wrote how to do this:

http://www2.sas.com/proceedings/sugi30/113-30.pdf

Since then, there have been a LOT of additions to PROC MI.  Look in the "Details" section of the doc for "imputing CLASS variables," such as here

SAS/STAT(R) 14.1 User's Guide

ohcomeon
Fluorite | Level 6

You want to be sure that your imputation model is compatible with your analysis model. If you log X in the imputation model but don't log X in your analysis model, your regression estimates will be biased. If you don't plan to log X in analysis, it is typically better not to log X in your imputation model, even if that means some impute values are out of bounds.

 

For details see this paper: https://arxiv.org/abs/1707.05360

sas-innovate-2026-white.png



April 27 – 30 | Gaylord Texan | Grapevine, Texas

Registration is open

Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!

Register now

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 4541 views
  • 0 likes
  • 3 in conversation