How to determine the one optimal decision threshold across multiple pr...

JamieTee · Posted 05-23-2020 10:59 PM

I am currently wanting to construct predictive models to identify patients who may have a misdiagnosis of a certain disease. However, the issue is that I have multiple databases and I do not want to combine the data into one set, so I need to create predictive models for each database. On top of that, I wanted to use different model approaches and use the most accurate model.

I'm currently facing the problem of determining a way to identify the optimal decision threshold to categorize the results of the predictive model as having been misdiagnosed (p=1) or not having been misdiagnosed (p=0). I've read about how we can use Youden's Index to determine the optimal cutoff for a model, but since I'll have multiple models for each of the database, would it make sense to use the SAME cuttoff across all the models and databases or have one for each database (but keep it consistent for each model within the database)? I'm a bit lost on what the best approach is. I haven't been able to find papers that provide detail on how they determine optimal decision thresholds on multiple models at the same time

P.S. I've read about how you can use the cost of each result (i.e., TP, TN, FP, FN) to determine the threshold, but these values are unknown for my disease of interest.

Sorry about the long post, and thank you for your help in advance!

How to determine the one optimal decision threshold across multiple predictive models

SAS Innovate 2025: Call for Content