Re: Comparing Different Precision Recall Area Under the Curve (PR AUC)...

GuyTreepwood · Posted 05-17-2023 01:22 PM

Hello,

I am running an experiment where I am comparing the performance of various models using the precision recall area under the curve (PR AUC) score. Other than comparing the raw scores against each other, are there any ways to compare the scores against each other in more statistically rigorous manner? Some candidate tests I read about were the Wilcoxon Signed Ranks and the Mann-Whitney U tests, but I am not sure how I can apply either to the results of my analysis.

StatDave · Posted 05-17-2023 02:52 PM

As noted in the list of Frequently Asked-for Statistics (FASTats, see the Important Links section of the Statistical Procedures Community page), the precision-recall curve and the area under it can be displayed either using PROC LOGISTIC in SAS Viya or using the PRcurve macro in SAS 9. But unlike the ROC curve, a test is not available to compare the areas under PR curves from competing models. But areas under ROC curves can be compared using the ROCCONTRAST statement in PROC LOGISTIC in SAS 9 or SAS Viya. See the example in the PROC LOGISTIC documentation.

SteveDenham · Posted 05-18-2023 09:12 AM

Unless you have multiple PR AUC estimates per model, or you can aggregate some models together, you probably won't be able to use the nonparametric tests you list. The same would be true of the parametric tests that come to my mind. Somehow, you need to find a measure of variability within your grouping variable/model in order to do any testing or generation of confidence bounds.

SteveDenham

GuyTreepwood · Posted 05-18-2023 10:58 AM

Hmmm...so what I did was train k models for the control, and k models for every 'treatment' condition using k-fold cross validation. I calculated the PR AUC score on every test fold and calculated the mean PR AUC score across all folds for every model 'condition' (control, and each treatment separately) I trained. I wanted to compare the mean PR AUC scores from each of the 'treatment' models against the control condition (to see if there are real differences in the results) and compare the 'treatment' model scores against each other. Does the setup I described above not work for either the non-parametric or parametric model comparison methods?

StatDave · Posted 05-18-2023 12:22 PM

In order for the treatment PR curves to be independent, the treatment models must have been fit to independent groups of subjects. You could, in that case, fit a single model to each and compare the ROC AUCs as described in SAS Note 45339 (http://support.sas.com/kb/45339). PROC LOGISTIC doesn't provide a standard error for the PR AUC as it does for the ROC AUC. If the same subjects were used for the treatments then the PR or ROC curves would be dependent and, similarly to as described for the ROC AUCs in the LOGISTIC documentation, you would presumably need to estimate a covariance matrix among the AUC values. Or maybe take a different approach - if the goal is to compare the treatments to control, you could consider simply fitting a single model that includes your multi-level treatment variable (which includes control) as a CLASS variable, possibly interacting it with any other predictors in the model. Then you could use tests on the parameters or contrasts of the parameters to make inferences about differences among the treatments or about the effects of other predictors differing among the treatments.

Comparing Different Precision Recall Area Under the Curve (PR AUC) Scores

Re: Comparing Different Precision Recall Area Under the Curve (PR AUC) Scores

Re: Comparing Different Precision Recall Area Under the Curve (PR AUC) Scores

Re: Comparing Different Precision Recall Area Under the Curve (PR AUC) Scores

Re: Comparing Different Precision Recall Area Under the Curve (PR AUC) Scores

Comparing Different Precision Recall Area Under the Curve (PR AUC) Scores

Re: Comparing Different Precision Recall Area Under the Curve (PR AUC) Scores

Re: Comparing Different Precision Recall Area Under the Curve (PR AUC) Scores

Re: Comparing Different Precision Recall Area Under the Curve (PR AUC) Scores

Re: Comparing Different Precision Recall Area Under the Curve (PR AUC) Scores

SAS Innovate 2025: Call for Content