We hear a lot in the news today about tests for Covid-19. Tests for the virus. Tests for the antibodies. Serological tests that require a blood sample. Nasopharyngeal tests where a nurse sticks an antiseptic swab up your nose and tells you, “Don’t worry, it will just feel like I am tickling your brain.” (It actually does not feel that bad. Really. I swear. It’s mildly uncomfortable at most, and it’s over in the flashiest of flashes.)
In this article, I shed some light on the biostatistics terms used to describe the accuracy of a test. And I show you how to use SAS Studio to find sensitivity, specificity, associated confidence limits, positive predictive value, and negative predictive value. Check out other articles in this series as well: SAS Helps You Understand Disease Spread: Part 2 Biostat Concepts in the Time of Coronavirus and Network Analysis in SAS Visual Analytics: Part 3 of Biostats in the Time of Coronavirus.
First of all, let’s acquaint ourselves with the fact that there is no perfect test. Every test when conducted on a large number of people likely returns at least some false positive and some false negative results. But the ideal test minimizes both false negatives and false positives.
Select any image to see a larger version.
Mobile users: To view the images, select the "Full" version at the bottom of the page.
For a general example, let’s assume 2100 people are tested to see if they have a certain disease. 100 actually do have disease and 2000 do not have disease. This shows a prevalence of disease among those tested of 4.7% (100/2100).
Prevalence most commonly refers to the prevalence of the disease in the population as a whole. For your purposes, that population might mean your county, your state or province, your country, or some other group of interest to you. Imagine that these 2100 people are actually representative of your full population. In reality, often people who are tested often are NOT representative of the population as a whole. But let’s assume for this example that the 2100 people are representative of the whole population. In that case, the prevalence of the disease in the general population would also be 4.7%.
We can look at sensitivity and specificity (using the columns of this two by two table).
Or we can look at positive predictive value and negative predictive value (using the rows of this two by two table).
A highly sensitive test means that there are few false negative results, and thus fewer cases of disease are missed.
False negatives can have dire consequence in the case of a potentially debilitating or fatal (but treatable) disease. In these cases, the individual may not get the care or treatment they need. Worst case scenario, they could die.
A highly specific test means that there are few false positive results, and thus fewer people are wrongly told they have the disease.
False positives are of the most concern when they are used for decisions such as highly invasive additional testing or highly invasive treatment. For example, a false positive for a malignant brain tumor that suggests brain surgery is needed would be a big problem.
The ideal test minimizes both false negatives and false positives, and thus has both high sensitivity and high specificity. However, no test is 100% correct.
PPV and NPV of a test depend heavily on the prevalence of what that test is intended to detect. At present, we don’t know the prevalence of SARS-CoV-2 antibody positive individuals in the U.S. population nor in the world. Prevalence can change based on the duration the virus is in the country and the effectiveness of mitigations. And can vary widely between locations and between different groups of people, such as health care workers, due to different rates of infection. So Positive and Negative Predictive values (PPV and NPV) are calculated using sensitivity, its specificity, and an estimate of the prevalence of SARS-CoV-2. Because all tests return some false positive and some false negative results, test results should be informed by other relevant information, such as clinical history or other test results.
IMPORTANT! Estimates of sensitivity and specificity are estimates. They include 95% confidence intervals. The more samples used to validate a test, the smaller the confidence interval becomes, meaning that we can be more confident in the estimates of sensitivity and specificity provided.
The Maryland Department of Health Laboratory is testing in Maryland using the Centers for Disease Control and Prevention (CDC) 2019-Novel Coronavirus (2019-nCoV) Real-Time Reverse Transcriptase (RT)-PCR Diagnostic Panel which has been approved for emergency use authorization (EUA) by the Federal Food and Drug Administration (FDA). A PCR (polymerase chain reaction) test is an antigen test.
According to this article, the Stanford Health Care reverse transcriptase–polymerase chain reaction test for SARS-CoV-2 approved on March 2, 2020) has an estimated sensitivity of 100% (95% CI , 72%-100%) and an estimated specificity of 95% (95% CI, 74%-100%). Why are these confidence limits so wide? These results are based on a sample of only 30 subjects!
A nice summary from the article follows:
“Serological tests test for the presence of antibodies in the blood, i.e., they test for the body’s immune response to the virus rather than the virus itself. In the early days of an infection when the body's immune response is still building, antibodies may not be detected.
A test's sensitivity can be estimated by determining whether or not it is able to detect antibodies in blood samples from patients who have been confirmed to have COVID-19 with a nucleic acid amplification test, or NAAT. In some validation studies of these tests, like the one FDA is conducting in partnership with NIH, CDC, and BARDA, the samples used, in addition to coming from patients confirmed to have COVID-19 by a NAAT, may also be confirmed to have antibodies present using other serology tests.
A test's specificity can be estimated by testing large numbers of samples collected and frozen before SARS-CoV-2 is known to have circulated to demonstrate that the test does not produce positive results in response to the presence of other causes of a respiratory infection, such as other coronaviruses.”
First we enter our data. These are not real data! I am just using these numbers for illustration purposes.
data BethImaginaryTest;
input Test Disease Count;
datalines;
0 0 1900
0 1 10
1 0 100
1 1 90
;
title 'Beth Imaginary Test';
Then we sort the data.
proc sort data=BethImaginaryTest;
by descending Test descending Disease;
run;
And finally we run PROC FREQ with weight = Count, and ask for table Test*Disease / senspec.
proc freq data=BethImaginaryTest order=data;
weight Count;
tables Test*Disease / senspec;
run;
Notice the confidence limits! Remember that the confidence limits will depend on your total sample size, as well as the prevalence in your sample.
CAUTION: Although PROC FREQTAB in SAS Viya provides capabilities similar to the capabilities of PROC FREQ in SAS 9 SAS/STAT software, do not use PROC FREQTAB on Viya to calculate these biostatistical measures. ORDER = DATA is not an option in PROC FREQTAB because in a distributed environment like Viya, the data are divided onto different nodes and threads. Therefore, if your positives are 1s and your negatives are 0s, the default ordering of the table will make your sensitivity, specificity, PPV and NPV incorrect. You could fix this by making your positives 0s and your negatives 1s, but that could be somewhat confusing and counterintuitive.
What are you testing for?
What is the method of obtaining the sample?
Prevalence
There can be a big difference in the prevalence in the testing population versus the prevalence in total population. The black line in the graph below shows the prevalence in the testing population in Maryland. In the early days of the testing, tests were not that easy to get, and so sick people were more likely to get tests. The prevalence of positive tests out of those who were tested at one point was over 25%. But that was not the prevalence in the total population of Maryland. As tests became more readily available (currently, anyone can get a COVID-19 test in Maryland at any time for free), we can get closer to a real picture of the prevalence of COVID-19 in our state. Approximating the prevalence in the whole population helps us to plan better regarding both measures to reduce cases as well as having adequate hospital beds, PPE, and medical staff.
If you are interested in data from all states, check out the dashboard from my alma mater, Johns Hopkins, where I studied biostatistics back in the old days.
Be careful and stay safe out there! And a big thank you to SAS for letting me work from home! Teaching live web classes rather than face-to-face has been a fun learning experience.
Great article, thank you! Expecting part 2 (maybe ROC curves and optimal cutoff?)!
Best regards
Hi. Great posting. I've read numerous articles where MDs are asked to use info about specificity, sensitivity, and prevalence to make a statement about whether a person has or does not have a particular condition. Wonderful summary (a bit disheartening) at
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4521525/
A while ago, after reading all types of articles by MDs about COVID testing, I used SAS/GRAPH (GPLOT) to produce the charts shown here. As stated in your posting, prevalence makes a difference and I think that these charts might add a bit to the wonderful info you provided.
RULE: as prevalence increases, it's better to believe a positive test than a negative test
SAS code used shown below (I used a data step to produce the plot data, not a SAS PROC).
data x; input sens spec; datalines; .95 .95 .70 .95 ; data y; set x; do prev = .05 to 0.95 by .05; p_real = 10000 * prev; n_real = 10000 - p_real; pr_pt = sens * p_real; nr_nt = spec * n_real; pr_nt = p_real - pr_pt; nr_pt = n_real - nr_nt; p_test = pr_pt + nr_pt; n_test = pr_nt + nr_nt; pvpos = round(100 * pr_pt / p_test,.1); pvneg = round(100 * nr_nt / n_test,.1); output; end; run; goptions reset=all gunit=pct ftext="arial" htext=2pct border; symbol1 i=j f='wingdings' v='6e'x h=3 c=red w=3; symbol2 i=j f='wingdings' v='6c'x h=3 c=blue w=3; axis1 label=(a=90 "PREDICTIVE VALUE OF TEST"); axis2 label=("PREVALENCE") order=(0.0 to 1.0 by 0.1) value=(h=2.0) minor=(n=1); legend1 position=(BOTTOM inside CENTER) label=none offset=(3,) mode=share across=1 value=(j=c c=red 'POSITIVE' c=blue ' NEGATIVE') shape=symbol(.0001,3); * choose a title1 based on sensitivity value; title1 h=2.5 ls=2 "SENSITIVITY: 0.70 / SPECIFICITY: 0.95";
*title1 h=2.5 ls=5 "SENSITIVITY: 0.95 / SPECIFICITY: 0.70";
title2 a=90 ls=2; title4 a=270 ls=2;
footnote1 ls=2; proc gplot data = y;
* use a WHERE statement to select data with sensitivity that matches plot title; where sens = .70; plot (pvpos pvneg) * prev / overlay vaxis=axis haxis=axis2 noframe legend=legend1; run; quit;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning and boost your career prospects.