Solved: ROC analysis for repeated measures

Merdock · Posted 07-21-2023 12:05 AM

I have a test dataset, similar to the one below:

data have;
input ID$ CAT$ GROUP$ VISIT$ LAB STATUS$ BSL_CAT$;
datalines;
a001	1	1	1	1997.02	0	1
a001	1	1	2	1275.52	0	1
a001	4	1	3	180.23	1	1
a001	2	1	4	735.91	0	1
a002	1	2	1	454.16	0	1
a002	1	2	3	1776.52	0	1
a002	3	2	4	73.15	1	1
a003	1	2	1	1700.26	0	1
a003	3	2	2	1621.32	1	1
a003	2	2	4	850.65	0	1
a004	2	3	1	1963.25	0	2
a004	2	3	2	544.87	0	2
a004	4	3	3	768.54	1	2
a004	2	3	4	780.16	0	2
a005	1	2	1	655.24	0	1
a005	2	2	4	722.14	0	1
a006	1	1	1	1472.06	0	1
a006	1	1	4	749.78	0	1
a007	2	1	1	848.88	0	2
a007	2	1	2	1482.78	0	2
a007	3	1	4	735.26	1	2
a008	1	1	1	1752.35	0	1
a008	1	1	2	1698.82	0	1
a008	3	1	3	1871.25	1	1
a008	4	1	4	587.35	1	1
a009	1	3	1	1549.89	0	1
a009	3	3	3	785.52	1	1
a009	1	3	4	384.72	0	1
a010	3	3	1	1211.95	1	3
a010	3	3	4	1596.38	1	3
a011	4	1	1	1785.45	1	4
a011	4	1	4	644.12	1	4
a012	3	3	1	798.28	1	3
a012	3	3	2	742.69	1	3
a012	3	3	3	1423.59	1	3
a012	3	3	4	1089.47	1	3
;
run;
proc print data=have; 
run;

where CAT is an ordinal categorical variable with 4 levels;

GROUP denotes the age group the participants are in;

VISIT is the follow up visit - participants have baseline (visit=1), and afterwards they can have up to 3 additional follow-up visits (visit=2,3,4);

LAB is a specific laboratory value;

STATUS is binary variable denoting the severity of their disease and is based on CAT – if CAT in (0,1) then STATUS=0 (not severe), else if CAT in (2,3,4) then STATUS=1 (severe);

BSL_CAT is the baseline value of CAT;

I would like to estimate the predictive ability of LAB as a marker for the detection of disease severity (STATUS severe vs not severe). I want to look at several cut-offs for LAB variable and assess its diagnostic performance by computing accuracy, sensitivity, specificity, positive predictive value, negative predictive value and likelihood ratios.

How can I do this?

Here is what I was thinking/I've done so far:

1. use PROC GLIMMIX (given the longitudinal nature of the observations) to investigate the association between STATUS as a binary outcome and LAB, VISIT as covariates, with VISIT as random effect.

2. then, take the predicted probabilities from this model and feed them into a logistic model to get a ROC curve but after this I'm pretty much stuck as I don't really know how to move forward or obtain accuracy, sensitivity, specificity, PPV, NPV and likelihood ratios.

*MODEL OUTCOME AS BINARY;
proc glimmix data=have noclprint;
class ID VISIT (ref="1");
model STATUS (event='1')= LAB VISIT/ dist=binary link=logit solution;
random VISIT/subject=ID residual type=cs;
output out=FITDAT pred(ilink noblup)=predprob;
NLOPTIONS tech=NRRIDG Maxiter=1000; 
run;
proc print data=FITDAT; run;

*ROC CURVE BASED ON PREDICTED PROBABILITIES FROM GLIMMIX;
proc logistic data=FITDAT;
model STATUS (event="1")= / nofit;
roc 'GLMM Model' pred=predprob;
run;

Does somebody had any code/suggestions they can share to help?

Thank you kindly.

StatDave · Posted 07-22-2023 12:05 PM

If your final goal is to find an optimal cutoff, then note that there are statistics (like Youden's index and others) that are often used for that. These can be obtained using the ROCPLOT macro (or in PROC LOGISTIC if you have a recent version of SAS Viya). However, note that the unique predicted probabilities, which are the cutoffs used for the ROC curve, are computed using ALL of the predictor values. So, it is not possible to talk about cutoffs on just your LAB predictor with your model. Each cutoff is determined by both LAB and VISIT using your model. If you remove VISIT from the MODEL statement then you can add the OUTROC= option in the MODEL statement in your PROC LOGISTIC step and then merge that data set together with your FITDAT data set.

proc sort data=fitdat out=fitdat2(rename=(predprob=_PROB_)); by predprob; run;
proc sort data=or out=or2; by _prob_; run;
data or3; merge fitdat2 or2; by _prob_; run;

This allows you to have a data set (OR3) that shows the LAB value corresponding to each cutoff. That data set also has the cell counts of the 2x2 table associated with each cutpoint and the sensitivity and 1-specificity statistics. Using those, you can easily compute the other statistics you want as shown in this note on computing various 2x2 table statistics.

View solution in original post

Ksharp · Posted 07-21-2023 07:45 AM

https://support.sas.com/kb/24/170.html

Merdock · Posted 07-21-2023 11:26 AM

@Ksharp, thanks for providing this link, though it looks like this example is specifically for cross sectional data, whereas my data is longitudinal (repeated measures on same participants over time). So I'm not sure how/if this can be adapted for my case where the status can change from one visit to another (for example, ID#a001 has STATUS=0 at visits 1, 2, 4 but STATUS=1 at visit 3 and so on). Maybe this would actually be more of a question for the Statistical Procedures group but I'm guess I'm a bit confused about whether I'm going down the right path or what is the best way to use ROC analysis to assess the diagnostic performance of my LAB predictor...

Ksharp · Posted 07-22-2023 04:00 AM

Yes. Better post it at Statistical Forum.

https://communities.sas.com/t5/Statistical-Procedures/bd-p/statistical_procedures

Maybe @StatDave could give you a hand.

From my thought, I think you should use predicted value to make a 2x2 contingency table to get these estimators.

PaigeMiller · Posted 07-22-2023 07:05 AM

A Google search turns up answers

https://communities.sas.com/t5/Statistical-Procedures/Logistic-regression-with-repeated-measures/td-...

https://support.sas.com/resources/papers/proceedings/proceedings/sugi27/p261-27.pdf

https://support.sas.com/resources/papers/proceedings/proceedings/sugi22/STATS/PAPER278.PDF

You can find other links as well

--
Paige Miller

StatDave · Posted 07-22-2023 12:05 PM

If your final goal is to find an optimal cutoff, then note that there are statistics (like Youden's index and others) that are often used for that. These can be obtained using the ROCPLOT macro (or in PROC LOGISTIC if you have a recent version of SAS Viya). However, note that the unique predicted probabilities, which are the cutoffs used for the ROC curve, are computed using ALL of the predictor values. So, it is not possible to talk about cutoffs on just your LAB predictor with your model. Each cutoff is determined by both LAB and VISIT using your model. If you remove VISIT from the MODEL statement then you can add the OUTROC= option in the MODEL statement in your PROC LOGISTIC step and then merge that data set together with your FITDAT data set.

proc sort data=fitdat out=fitdat2(rename=(predprob=_PROB_)); by predprob; run;
proc sort data=or out=or2; by _prob_; run;
data or3; merge fitdat2 or2; by _prob_; run;

This allows you to have a data set (OR3) that shows the LAB value corresponding to each cutoff. That data set also has the cell counts of the 2x2 table associated with each cutpoint and the sensitivity and 1-specificity statistics. Using those, you can easily compute the other statistics you want as shown in this note on computing various 2x2 table statistics.

ROC analysis for repeated measures

Re: ROC analysis for repeated measures

Re: ROC analysis for repeated measures

Re: ROC analysis for repeated measures

Re: ROC analysis for repeated measures

Re: ROC analysis for repeated measures

Re: ROC analysis for repeated measures

SAS Innovate 2025: Call for Content

Classroom Training Available!