BookmarkSubscribeRSS Feed
levender8622
Fluorite | Level 6

I'm trying to calculate AUCs using predicted values obtained from multinomial logistic regression. However, because in the cohort for prediction there are weights for considering under-sampling, I need to calculate weighted AUCs.

Previously I used the SAS macro %MultAUC (https://support.sas.com/kb/64/029.html) for unweighted AUC, but it doesn't incorporate weights. If someone could help regarding how to add weights in this macro, or any other ways to do so, it'll be greatly appreciated.

8 REPLIES 8
sbxkoenk
SAS Super FREQ

Hello,

 

Are you sure you need the weights?

AUC will not change if the ranks of the observations remain the same.

 

For example, in case of a binary target, ... adjusting posterior probabilities for the real (population) priors often means that the probabilities are adjusted downwards (that is if the event was over-sampled or the non-event was under-sampled). However, that does not change the ranking of the observations and therefore , the AUC stays the same!

 

Give it another thought to be sure you need weights.

 

Koen

Rick_SAS
SAS Super FREQ

This isn't my area, but if these are survey weights, I believe you can use PROC SURVEYLOGISTIC to perform generalized logit regression, which is the same as multinomial logistic regression. See this doc example.  The doc mentions that you can obtain the rank correlation, including the concordance, c, which for binary response is the area under the ROC curve. Unfortunately, I do not know whether that statistic is applicable for a multinomial response.

levender8622
Fluorite | Level 6

Thank you @Rick_SAS , but somehow for my problem I already have the predicted probabilities for individuals without weights and for some reason I can't get the weighted predicted probabilities (I'm pretty sure what I did was right but I won't elaborate here for make things even more complicated), and this is why I need a way to calculate AUC considering the weights. To make the question simple, I think it eventually leads to a weighted Wilcoxon rank sum test with weight, and I saw some of your relevant posts previously and wondering if you know a way to do so? Thanks again.

StatDave
SAS Super FREQ

As Rick notes, if the weights are survey weights, then you can use PROC SURVEYLOGISTIC to fit an appropriate multinomial logistic model. As noted in the macro documentation, if you use the PREDPROBS= option in the OUTPUT statement, you can use the macro in SURVEYLOGISTIC in exactly the same way as in PROC LOGISTIC. Note that there is no need to specify weights in the macro because the weights are accounted for in the fit of the model and adjust the predicted probabilities which are the inputs to the macro. Similarly, if the weights are just importance weights, not from survey sampling, then the weights can be used in the WEIGHT statement in PROC LOGISTIC and the above still applies.

levender8622
Fluorite | Level 6

Thank you @StatDave , but as I said, I couldn't really add weights in the my model because my analysis involves a training data where there's no weight, and a test data where there are weights. I fit the multinomial regression on training data and try to get predicted values for my test data, and weights could not be incorporated in this case (or could it?). But then I need to get AUC on my test data, and this is why I need to consider weight in calculating AUC, and eventually comes to the question of weighted Wilcoxon rank sum test. Please let me know if this makes sense to you, thank you.

StatDave
SAS Super FREQ
The proper way to use the weights is in the model fit. If you used the weights when fitting the model to your training data, then the model parameter estimates are adjusted for the weights and they then produce the desired predicted probabilities for both the training and the test data. So again, there is no reason to further involve the weights in the computations from the macro.
sbxkoenk
SAS Super FREQ

Hello,

 

Agreeing with @StatDave .

 

Also, see here :

Usage Note 39109: Measures and tests of the discriminatory power of a binary logistic model
https://support.sas.com/kb/39/109.html

 

You can indeed calculate the c-statistic (AUC for a binary target) with the 

Wilcoxon rank-sum test or with the Mann-Whitney U test (one of the two, I don't remember exactly)

, but PROC NPAR1WAY does not have a WEIGHT statement.

 

Cheers,

Koen

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 8 replies
  • 1617 views
  • 3 likes
  • 5 in conversation