BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Dennisky
Quartz | Level 8

Dear all,

We want to create a new test (Method A) to screen for a disease.

In fact, there is a ‘gold-standard’ screening test for the disease (Method B).

And the test result of gold-standard (Method B) is binary (Positive vs Negative).

However, the test result of the new test (Method A) is 3-class classification (Positive vs Negative vs. Doubtful).

Could we still calculate the sensitivity and specificity as being ways to indicate the accuracy of the new test (Method A)? (Table 1)123.png

1 ACCEPTED SOLUTION

Accepted Solutions
sbxkoenk
SAS Super FREQ

Hello @Dennisky ,

 

Why not?

 

Sensitivity is the metric that evaluates a model's ability to predict true positives.

Sensitivity = recall = true positive rate !

 

To calculate " Recall ", use the following formula: TP / ( TP + FN ).

sbxkoenk_0-1644685048566.png

 

You have to collapse Doubtful and Negative into Negative.

I don't think it's a good idea to play with weights (considering Doubtful as a semi-Negative and a semi-Positive at the same time). Do not do that. Be "conservative" in your statistical choices.

 

Thanks,

Koen

View solution in original post

8 REPLIES 8
sbxkoenk
SAS Super FREQ

Hello @Dennisky ,

 

Why not?

 

Sensitivity is the metric that evaluates a model's ability to predict true positives.

Sensitivity = recall = true positive rate !

 

To calculate " Recall ", use the following formula: TP / ( TP + FN ).

sbxkoenk_0-1644685048566.png

 

You have to collapse Doubtful and Negative into Negative.

I don't think it's a good idea to play with weights (considering Doubtful as a semi-Negative and a semi-Positive at the same time). Do not do that. Be "conservative" in your statistical choices.

 

Thanks,

Koen

Dennisky
Quartz | Level 8

Dear Prof. Koen,
Thank you very much. We really appreciate your help in resolving the problem.
It’s very important for the statistical method in the diagnostic test.
As your suggestion, we have to collapse Doubtful and Negative into Negative due to be "conservative" in our statistical choices. (That is, the Doubtful need to be considered as the Negative at our study).
Based on this premise, we can also calculate the Specificity, PPV, NPV and other factors that combine to describe how valid a test is? But if we want to calculate the Specificity, we need to collapse Doubtful and Positive into Positive.

 

For example, there are 1000 people have participated in our study.

In fact, the MR examination (Method B) is considered as the gold-standard test (100 people have the disease and 900 are not the disease).

A new test named method A is 3-class classification (Positive vs Negative vs. Doubtful). 1000 people are also tested for disease by method A:

180 people have the disease (Positive), 720 people are not diseased (Negative) and 100 people are doubtful.

Moreover, in the 100 people who are doubtful by method A, 10 people have been diagnosed with disease and 90 without the disease by gold-standard test (method B).(see table S1)

s1234.png

 

Now, we calculate the Sensitivity and Positive Predictive Value (PPV), Specificity and Negative Predictive Value (NPV) and Accuracy.

 

When we calculate the Sensitivity and Positive Predictive Value (PPV),we need to collapse Doubtful and Negative into Negative due to be "conservative" in the statistical choices(we calculated them from the data in table S2, 100 people are doubtful need to be considered as negative).

Sensitivity:

(80/100) × 100=80%

Positive Predictive Value:

(80/180) ×100=44.4%

 

s2.png

On the other hand, when we calculate the Specificity and Negative Predictive Value (NPV), the Doubtful need to be considered as the Positive (we calculated them from the data in table S3, 100 people are doubtful need to be considered as positive).

 

Specificity:

(710/900) × 100=78.9%

Negative Predictive Value:

(710/720) × 100=98.6%

 

s3.png

Morever, when we calculate the accuracy, we should draw the data from the table S2 and S3 at same time.

Accuracy:

((80+710)/ (80+710+20+190)) × 100 =79%

 

 

 

Am I right? If it has any unreasonable point , please oblige us with your valuable comments .

Thanks a lot! 

Dennis

 

sbxkoenk
SAS Super FREQ

Hello @Dennisky ,

 

You raise important points. How does being "conservative in the statistical choices" translate into collapsing of the 3 outcome categories (to 2) when calculating the other performance statistics that are classically calculated in a 2 x 2 confusion matrix?

 

I will print your post and give it some thought.
More news tomorrow.

 

I will also check if there exists a kind of "agreement score" for a frequency table with 3 categories in the rows and 2 categories in the columns.

 

Thanks,

Koen

Dennisky
Quartz | Level 8

Dear Prof. Koen,

Thank you for your timely response.

We look forward to your suggestions!

Additionally, we have tried to calculated a kappa value shows an agreement between two methods.

As we point out, the number of classification levels of the results of two Methods is inconsistent.

So, we assigning a small value (e.g., 0.001) for the zero cell in the table for calculating the kappa value between two methods. (e.g., see table S4 and S5) (we are not sure whether the statistic method is correct).

s4.pngS5.png

Dennisky
Quartz | Level 8
Hi Prof. Koen @sbxkoenk

Is there any other information which need be provied to you for the problem? We look forward to your suggestions!
Thank you very much!
Dennis
sbxkoenk
SAS Super FREQ

No, sorry!

Your information is very clear.
But I had lost sight of this a bit.
Thanks for the reminder.

 

Just a small correction: I am a consultant | statistician with SAS in the BeNeLux region but I am not a professor. I am flattered, but I cannot claim that title. 😊

 

As for the calculation of sensitivity, specificity, PPV and NPV, I think you did it correctly (correctly merging doubtful into negative or positive in order to be conservative).
But I don't actually know of any papers where this was solved in the same way. So if you will publish this in a scientific journal, it might take some convincing.

 

I don't have time today nor tomorrow, but at the weekend I will take a look at that kappa agreement (or other agreement scores and tests).
That way, you'll be a bit more in line with the standard accepted methods.

 

Have a nice end-of-the-week,

Koen

Dennisky
Quartz | Level 8
Dear Koen @sbxkoenk
Please forgive us for disturbing you.
Thank you again.
Dennis
sbxkoenk
SAS Super FREQ

Hello,

 

Small update.
I don't really have anything to add at the moment. 🤐😞

 

It would of course be good if you could distil a doubtful category from the gold standard test as well.

But that outcome might be super-binary (?).

If the gold standard test is a score to be compared with a threshold (to decide on 0 or 1), you could consider values super-close to the threshold as "doubtful".
That way you have a 3 x 3 confusion matrix.

 

Thanks,

Koen

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 8 replies
  • 1685 views
  • 2 likes
  • 2 in conversation