BookmarkSubscribeRSS Feed
deleted_user
Not applicable
Hi,

I am trying to see if imputing the data using proc mi is a good option.

I have 8 numeric variables .. out of which 6 var have less than 5% of missing values

out of remaining 2 vars 1 has almost 50% missing values and the other has 18% missing values. Is it advisable to use proc mi to impute values for a var with more than 50% missing values?

Thanks,

L
6 REPLIES 6
Doc_Duke
Rhodochrosite | Level 12
Lisa,

I don't think that there is a threshold. You do need to consider "why" you have so many missing variables in those two variables. If there is a reason that does not go into the MAR or MCAR categories, then you may be better to explicitly model "missingness" with an indicator variable.

Doc Muhlbaier
Duke
deleted_user
Not applicable
I am working on survey data and the reason some of the values are missing is that :

1. The person skipped that question
2. The question was not applicable to him.
Doc_Duke
Rhodochrosite | Level 12
Personally, I would be concerned that either of them fit the PROC MI assumptions. Skipping can be because the person didn't see it or something else extraneous (MCAR), but can also be because the person found the question intrusive (income, for instance would not be appropriate for PROC MI). Not Applicable is definitely an answer that needs to be modeled, using PROC MI to impute another value is going to bias the results.
deleted_user
Not applicable
I really don't want to loose an observation..

So to do that I am currently imputing data by placing mean values at places where there was missing data or values which were not applicable or cust didn't have experience in those ...

so is using proc mi a better option then that?
Doc_Duke
Rhodochrosite | Level 12
There is a fair amount of literature to indicate that mean substitution is one of the worst methods of imputation. PROC MI may well be better even with the assumptions violated.

You may want to consider a combination approach. Explicitly model missings for the two with the most missing (that adds two indicator variables to the model) and using PROC MI for the other 6.

If you reach different conclusions with listwise deletion, mean substitution, and PROC MI, then you need to look further into the mechanisms to understand the story your data are trying to tell..
deleted_user
Not applicable
I tried running proc MI on all vars I get results that are approximately close to the results when I don't use any imputation.

In case of mean substituition are not as close as proc mi ..

How should I compare which one performs better??

Thanks,

L

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 6 replies
  • 1546 views
  • 0 likes
  • 2 in conversation