Dear Experts,
I would appreciate your advice on calculating the power / sample size for an analysis of a ROC curve.
Assuming a prevalence of 6% positive screens on the gold standard clinical interview among eligible subjects, I am not really interested in detecting difference from chance (AUC=0.50) but rather superiority to a particular value (e.g, AUC=0.70) and have a basic power to detect that particular value (e.g. power= 0.70, or higher ). I expect the AUC for my test to be observed as high as ~0.90.
Much appreciated.
Ping
I suspect this is not an easy problem, but it is an interesting question. I do not know the answer, but PROC POWER in SAS/STAT provides power /sample size calculations for the LR test, and it looks like the power depends on the distribution of the covariates and the correlation between the covariates, among other issues. It might also depend on your sampling scheme (e.g., are you oversampling a rare event?)
Again, I do not know the answer, but It seems like you have two options:
(1) search the literature to see if the answer is known and (if so) implement that option in SAS by using the DATA step or PROC IML
(2) simulation can estimate power and sample sizes, and might be simpler to implement. Again, you'll have to specify the distribution of the covariates and the parameters for the linear predictor. You'll have to estimate the parameters (difference of means, standard deviations, regression coefficients,...) from a previous study/analysis.
The area under the ROC curve (AUC) assesses the discrimination ability of the model as described in this note. A given model might discriminate well (or poorly) regardless of the sample size. For example, a given model might discriminate poorly because it is missing one or more important predictors of the response event or might not properly specify an important predictor (such as by omitting interactions or other higher-order terms). Simply adding more data will not improve the model's discriminating ability. So, I don't believe you can do a power analysis to determine the sample size needed to achieve a sufficiently large AUC. That will be achieved by defining a model that includes the important predictors and has them properly specified.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.