Hello,
I am running an experiment where I am comparing the performance of various models using the precision recall area under the curve (PR AUC) score. Other than comparing the raw scores against each other, are there any ways to compare the scores against each other in more statistically rigorous manner? Some candidate tests I read about were the Wilcoxon Signed Ranks and the Mann-Whitney U tests, but I am not sure how I can apply either to the results of my analysis.
As noted in the list of Frequently Asked-for Statistics (FASTats, see the Important Links section of the Statistical Procedures Community page), the precision-recall curve and the area under it can be displayed either using PROC LOGISTIC in SAS Viya or using the PRcurve macro in SAS 9. But unlike the ROC curve, a test is not available to compare the areas under PR curves from competing models. But areas under ROC curves can be compared using the ROCCONTRAST statement in PROC LOGISTIC in SAS 9 or SAS Viya. See the example in the PROC LOGISTIC documentation.
Unless you have multiple PR AUC estimates per model, or you can aggregate some models together, you probably won't be able to use the nonparametric tests you list. The same would be true of the parametric tests that come to my mind. Somehow, you need to find a measure of variability within your grouping variable/model in order to do any testing or generation of confidence bounds.
SteveDenham
Hmmm...so what I did was train k models for the control, and k models for every 'treatment' condition using k-fold cross validation. I calculated the PR AUC score on every test fold and calculated the mean PR AUC score across all folds for every model 'condition' (control, and each treatment separately) I trained. I wanted to compare the mean PR AUC scores from each of the 'treatment' models against the control condition (to see if there are real differences in the results) and compare the 'treatment' model scores against each other. Does the setup I described above not work for either the non-parametric or parametric model comparison methods?
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.