Solved: Re: Could we calculate the sensitivity when the classification of test...

Dennisky · Posted 02-12-2022 10:41 AM

Dear all,

We want to create a new test (Method A) to screen for a disease.

In fact, there is a ‘gold-standard’ screening test for the disease (Method B).

And the test result of gold-standard (Method B) is binary (Positive vs Negative).

However, the test result of the new test (Method A) is 3-class classification (Positive vs Negative vs. Doubtful).

Could we still calculate the sensitivity and specificity as being ways to indicate the accuracy of the new test (Method A)? (Table 1)

sbxkoenk · Posted 02-12-2022 12:01 PM

Hello @Dennisky ,

Why not?

Sensitivity is the metric that evaluates a model's ability to predict true positives.

Sensitivity = recall = true positive rate !

To calculate " Recall ", use the following formula: TP / ( TP + FN ).

You have to collapse Doubtful and Negative into Negative.

I don't think it's a good idea to play with weights (considering Doubtful as a semi-Negative and a semi-Positive at the same time). Do not do that. Be "conservative" in your statistical choices.

Thanks,

Koen

View solution in original post

sbxkoenk · Posted 02-12-2022 12:01 PM

Hello @Dennisky ,

Why not?

Sensitivity is the metric that evaluates a model's ability to predict true positives.

Sensitivity = recall = true positive rate !

To calculate " Recall ", use the following formula: TP / ( TP + FN ).

You have to collapse Doubtful and Negative into Negative.

I don't think it's a good idea to play with weights (considering Doubtful as a semi-Negative and a semi-Positive at the same time). Do not do that. Be "conservative" in your statistical choices.

Thanks,

Koen

Dennisky · Posted 02-12-2022 10:27 PM

Dear Prof. Koen,
Thank you very much. We really appreciate your help in resolving the problem.
It’s very important for the statistical method in the diagnostic test.
As your suggestion, we have to collapse Doubtful and Negative into Negative due to be "conservative" in our statistical choices. (That is, the Doubtful need to be considered as the Negative at our study).
Based on this premise, we can also calculate the Specificity, PPV, NPV and other factors that combine to describe how valid a test is? But if we want to calculate the Specificity, we need to collapse Doubtful and Positive into Positive.

For example, there are 1000 people have participated in our study.

In fact, the MR examination (Method B) is considered as the gold-standard test (100 people have the disease and 900 are not the disease).

A new test named method A is 3-class classification (Positive vs Negative vs. Doubtful). 1000 people are also tested for disease by method A:

180 people have the disease (Positive), 720 people are not diseased (Negative) and 100 people are doubtful.

Moreover, in the 100 people who are doubtful by method A, 10 people have been diagnosed with disease and 90 without the disease by gold-standard test (method B).(see table S1)

Now, we calculate the Sensitivity and Positive Predictive Value (PPV), Specificity and Negative Predictive Value (NPV) and Accuracy.

When we calculate the Sensitivity and Positive Predictive Value (PPV)，we need to collapse Doubtful and Negative into Negative due to be "conservative" in the statistical choices(we calculated them from the data in table S2, 100 people are doubtful need to be considered as negative).

Sensitivity:

(80/100) × 100=80%

Positive Predictive Value:

(80/180) ×100=44.4%

On the other hand, when we calculate the Specificity and Negative Predictive Value (NPV), the Doubtful need to be considered as the Positive (we calculated them from the data in table S3, 100 people are doubtful need to be considered as positive).

Specificity:

(710/900) × 100=78.9%

Negative Predictive Value:

(710/720) × 100=98.6%

Morever, when we calculate the accuracy, we should draw the data from the table S2 and S3 at same time.

Accuracy:

((80+710)/ (80+710+20+190)) × 100 =79%

Am I right? If it has any unreasonable point , please oblige us with your valuable comments .

Thanks a lot!

Dennis

sbxkoenk · Posted 02-13-2022 05:58 AM

Hello @Dennisky ,

You raise important points. How does being "conservative in the statistical choices" translate into collapsing of the 3 outcome categories (to 2) when calculating the other performance statistics that are classically calculated in a 2 x 2 confusion matrix?

I will print your post and give it some thought.
More news tomorrow.

I will also check if there exists a kind of "agreement score" for a frequency table with 3 categories in the rows and 2 categories in the columns.

Thanks,

Koen

Dennisky · Posted 02-13-2022 08:16 AM

Dear Prof. Koen,

Thank you for your timely response.

We look forward to your suggestions!

Additionally, we have tried to calculated a kappa value shows an agreement between two methods.

As we point out, the number of classification levels of the results of two Methods is inconsistent.

So, we assigning a small value (e.g., 0.001) for the zero cell in the table for calculating the kappa value between two methods. (e.g., see table S4 and S5) (we are not sure whether the statistic method is correct).

Dennisky · Posted 02-17-2022 01:54 AM

Hi Prof. Koen @sbxkoenk

Is there any other information which need be provied to you for the problem? We look forward to your suggestions!
Thank you very much！
Dennis

sbxkoenk · Posted 02-17-2022 03:53 AM

No, sorry!

Your information is very clear.
But I had lost sight of this a bit.
Thanks for the reminder.

Just a small correction: I am a consultant | statistician with SAS in the BeNeLux region but I am not a professor. I am flattered, but I cannot claim that title. 😊

As for the calculation of sensitivity, specificity, PPV and NPV, I think you did it correctly (correctly merging doubtful into negative or positive in order to be conservative).
But I don't actually know of any papers where this was solved in the same way. So if you will publish this in a scientific journal, it might take some convincing.

I don't have time today nor tomorrow, but at the weekend I will take a look at that kappa agreement (or other agreement scores and tests).
That way, you'll be a bit more in line with the standard accepted methods.

Have a nice end-of-the-week,

Koen

Dennisky · Posted 02-17-2022 08:55 PM

Dear Koen @sbxkoenk
Please forgive us for disturbing you.
Thank you again.
Dennis

sbxkoenk · Posted 02-24-2022 06:24 PM

Hello,

Small update.
I don't really have anything to add at the moment. 🤐😞

It would of course be good if you could distil a doubtful category from the gold standard test as well.

But that outcome might be super-binary (?).

If the gold standard test is a score to be compared with a threshold (to decide on 0 or 1), you could consider values super-close to the threshold as "doubtful".
That way you have a 3 x 3 confusion matrix.

Thanks,

Koen

Could we calculate the sensitivity when the classification of test results are different ?

Re: Could we calculate the sensitivity when the classification of test results are different ?

Re: Could we calculate the sensitivity when the classification of test results are different ?

Re: Could we calculate the sensitivity when the classification of test results are different ?

Re: Could we calculate the sensitivity when the classification of test results are different ?

Re: Could we calculate the sensitivity when the classification of test results are different ?

Re: Could we calculate the sensitivity when the classification of test results are different ?

Re: Could we calculate the sensitivity when the classification of test results are different ?

Re: Could we calculate the sensitivity when the classification of test results are different ?

Re: Could we calculate the sensitivity when the classification of test results are different ?