BookmarkSubscribeRSS Feed
ciro
Quartz | Level 8

Dear community,

I am trying to figure it out how to perform an imputation through donor of minimum distance within groups.

unfortunately it seems that  proc survey impute does not perform that (only random imputation).

is there another direct option?

in case, can you suggest appropriate ways to do it?

one issue is that the data set is very large with about 30 variables and 20-30 millions of records.

any hint is greatly appreciated.

thank you very much in advance

3 REPLIES 3
ciro
Quartz | Level 8
and I am not very good with IML...
SteveDenham
Jade | Level 19

I may not understand all of the constraints for minimum distance donor imputation, but it sounds a lot like what is referred to in PROC MI as fully conditional specification (FCS) predictive mean matching.  I based this on the Details section on this method in the PROC MI documentation. It looks to me like a predicted mean for the missing value is estimated via regression, and then the K closest values are used as a basis set from which a value is randomly selected.  Does that fit?  I suppose you could find the minimum distance replacement value by setting K=1.  The last two paragraphs in the Details point out the advantages/disadvantages of large and small K, and seem to imply that this method is more robust to the assumption of normality.

 

SteveDenham

ciro
Quartz | Level 8

Thank you Steve for the quick reply.

I had a brief look into it. However the method I am trying to apply would require to choose the donors based on actual measures of variables instead that of a sinthetic measure such as the predictive mean. Moreover I would like to have a dataset with the Id of the chosen donors for each of recipient.

 

sas-innovate-2026-white.png



April 27 – 30 | Gaylord Texan | Grapevine, Texas

Registration is open

Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!

Register now

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 1241 views
  • 2 likes
  • 2 in conversation