Thanks Koen! What you suggested is exactly what we had been doing as a practical approach. However, it occurs to me that the holdout/test set is essentially being treated as a validation set if we do it this way since we basically pick our models based on test set performance metrics instead of validation set. It is also quite a hassle to write macros to plot precision-recall curves and other performance metrics e.g. ROC, confusion metrics etc on test set within EM either with cutoff node or SAS code. One thing I want to point out is that the sub-sample picked model as you suggested performed really poor some time (not always) on the test/hold out dataset due to obvious bias of subsampling (large variances within the original population). That is the real reason I wanted to calibrate the models based on test/hold out datasets. Any comments/suggestions? Thanks
... View more