BookmarkSubscribeRSS Feed
ciro
Quartz | Level 8

Dear community,

I am trying to figure it out how to perform an imputation through donor of minimum distance within groups.

unfortunately it seems that  proc survey impute does not perform that (only random imputation).

is there another direct option?

in case, can you suggest appropriate ways to do it?

one issue is that the data set is very large with about 30 variables and 20-30 millions of records.

any hint is greatly appreciated.

thank you very much in advance

3 REPLIES 3
ciro
Quartz | Level 8
and I am not very good with IML...
SteveDenham
Jade | Level 19

I may not understand all of the constraints for minimum distance donor imputation, but it sounds a lot like what is referred to in PROC MI as fully conditional specification (FCS) predictive mean matching.  I based this on the Details section on this method in the PROC MI documentation. It looks to me like a predicted mean for the missing value is estimated via regression, and then the K closest values are used as a basis set from which a value is randomly selected.  Does that fit?  I suppose you could find the minimum distance replacement value by setting K=1.  The last two paragraphs in the Details point out the advantages/disadvantages of large and small K, and seem to imply that this method is more robust to the assumption of normality.

 

SteveDenham

ciro
Quartz | Level 8

Thank you Steve for the quick reply.

I had a brief look into it. However the method I am trying to apply would require to choose the donors based on actual measures of variables instead that of a sinthetic measure such as the predictive mean. Moreover I would like to have a dataset with the Id of the chosen donors for each of recipient.

 

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 739 views
  • 2 likes
  • 2 in conversation