BookmarkSubscribeRSS Feed
SlutskyFan
Obsidian | Level 7
I just read the section on M-estimators from SAS's reference to 'Understanding Robust and Explorotory Data Analysis (Hoaglin, Mosteller, & Tukey) and have a good understanding of M-estimators. I understand their advantages over say mean imputation, but does anyone have any advice for when M-estimators would be better or worse than say tree imputation or distribution methods?

Any references on the pros and cons of each of these methods? It just seems more natural to me to use tree imputation vs. a point estimate like the mean, or even an M-estimator regardless of its resistance or robustness efficiency. It seems like tree imputation is just 'more informed'? (although I guess theoretically, a lot of information is captured by a mean or an M-estimator of location in the sense that most of the observations should be centered around these points.)

Any thoughts?
2 REPLIES 2
topkatz
Obsidian | Level 7
Hi.

I did a quick web search on comparison of imputation methods. There were a bunch of journal articles that I didn't want to buy, but the abstracts all said similar things -- imputing was better than not imputing, and multivariate methods outperformed univariate methods.

SAS software seems to be lagging the state of the art in imputation by about a decade -- I think their last serious improvement for imputation was when they added PROC MI to SAS/STAT about ten years ago (and that methodology had already been around for twenty years at that time). Enterprise Miner doesn't appear to offer expectation maximization for multiple imputation, but it has a few methods not available in STAT, notably tree imputation, as you mentioned.

I once read a pretty convincing endorsement of cluster imputation given by one of the eminent senior statisticians at SAS, Warren Sarle -- I wish I could find it, I'd copy it here. Cluster imputation is kind of a compromise between univariate and multivariate methods. Finding the clusters is a multivariate technique, but once you have the clusters, you do a simple substitution of cluster means or medians for the missing values of observations within each cluster (I suppose you could do M-estimators within each cluster, if you wanted to). You can get cluster imputation in both SAS/STAT and Enterprise Miner, but you have to know where to look. In SAS/STAT it's in PROC FASTCLUS; in Enterprise Miner, it's in the Cluster node, not the Impute node.
SlutskyFan
Obsidian | Level 7
Thanks! That was very helpful.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 992 views
  • 0 likes
  • 2 in conversation