BookmarkSubscribeRSS Feed
JamieTee
Fluorite | Level 6

I am currently wanting to construct predictive models to identify patients who may have a misdiagnosis of a certain disease. However, the issue is that I have multiple databases and I do not want to combine the data into one set, so I need to create predictive models for each database. On top of that, I wanted to use different model approaches and use the most accurate model.  

 

I'm currently facing the problem of determining a way to identify the optimal decision threshold to categorize the results of the predictive model as having been misdiagnosed (p=1) or not having been misdiagnosed (p=0). I've read about how we can use Youden's Index to determine the optimal cutoff for a model, but since I'll have multiple models for each of the database, would it make sense to use the SAME cuttoff across all the models and databases or have one for each database (but keep it consistent for each model within the database)? I'm a bit lost on what the best approach is. I haven't been able to find papers that provide detail on how they determine optimal decision thresholds on multiple models at the same time

 

P.S. I've read about how you can use the cost of each result (i.e., TP, TN, FP, FN) to determine the threshold, but these values are unknown for my disease of interest. 

 

Sorry about the long post, and thank you for your help in advance!

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 0 replies
  • 433 views
  • 0 likes
  • 1 in conversation