BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
Merdock
Quartz | Level 8

I have a test dataset, similar to the one below:

data have;
input ID$ CAT$ GROUP$ VISIT$ LAB STATUS$ BSL_CAT$;
datalines;
a001	1	1	1	1997.02	0	1
a001	1	1	2	1275.52	0	1
a001	4	1	3	180.23	1	1
a001	2	1	4	735.91	0	1
a002	1	2	1	454.16	0	1
a002	1	2	3	1776.52	0	1
a002	3	2	4	73.15	1	1
a003	1	2	1	1700.26	0	1
a003	3	2	2	1621.32	1	1
a003	2	2	4	850.65	0	1
a004	2	3	1	1963.25	0	2
a004	2	3	2	544.87	0	2
a004	4	3	3	768.54	1	2
a004	2	3	4	780.16	0	2
a005	1	2	1	655.24	0	1
a005	2	2	4	722.14	0	1
a006	1	1	1	1472.06	0	1
a006	1	1	4	749.78	0	1
a007	2	1	1	848.88	0	2
a007	2	1	2	1482.78	0	2
a007	3	1	4	735.26	1	2
a008	1	1	1	1752.35	0	1
a008	1	1	2	1698.82	0	1
a008	3	1	3	1871.25	1	1
a008	4	1	4	587.35	1	1
a009	1	3	1	1549.89	0	1
a009	3	3	3	785.52	1	1
a009	1	3	4	384.72	0	1
a010	3	3	1	1211.95	1	3
a010	3	3	4	1596.38	1	3
a011	4	1	1	1785.45	1	4
a011	4	1	4	644.12	1	4
a012	3	3	1	798.28	1	3
a012	3	3	2	742.69	1	3
a012	3	3	3	1423.59	1	3
a012	3	3	4	1089.47	1	3
;
run;
proc print data=have; 
run;

where CAT is an ordinal categorical variable with 4 levels;

GROUP denotes the age group the participants are in;

VISIT is the follow up visit - participants have baseline (visit=1), and afterwards they can have up to 3 additional follow-up visits (visit=2,3,4);

LAB is a specific laboratory value;

STATUS is binary variable denoting the severity of their disease and is based on CAT – if CAT in (0,1) then STATUS=0 (not severe), else if CAT in (2,3,4) then STATUS=1 (severe);

BSL_CAT is the baseline value of CAT;

 

I would like to estimate the predictive ability of LAB as a marker for the detection of disease severity (STATUS severe vs not severe). I want to look at several cut-offs for LAB variable and assess its diagnostic performance by computing accuracy, sensitivity, specificity, positive predictive value, negative predictive value and likelihood ratios.

How can I do this?

 

Here is what I was thinking/I've done so far: 

1. use PROC GLIMMIX (given the longitudinal nature of the observations) to investigate the association between STATUS as a binary outcome and LAB, VISIT as covariates, with VISIT as random effect.

2. then, take the predicted probabilities from this model and feed them into a logistic model to get a ROC curve but after this I'm pretty much stuck as I don't really know how to move forward or obtain accuracy, sensitivity, specificity, PPV, NPV and likelihood ratios.

*MODEL OUTCOME AS BINARY;
proc glimmix data=have noclprint;
class ID VISIT (ref="1");
model STATUS (event='1')= LAB VISIT/ dist=binary link=logit solution;
random VISIT/subject=ID residual type=cs;
output out=FITDAT pred(ilink noblup)=predprob;
NLOPTIONS tech=NRRIDG Maxiter=1000; 
run;
proc print data=FITDAT; run;

*ROC CURVE BASED ON PREDICTED PROBABILITIES FROM GLIMMIX;
proc logistic data=FITDAT;
model STATUS (event="1")= / nofit;
roc 'GLMM Model' pred=predprob;
run;

Does somebody had any code/suggestions they can share to help?

Thank you kindly.

1 ACCEPTED SOLUTION

Accepted Solutions
StatDave
SAS Super FREQ

If your final goal is to find an optimal cutoff, then note that there are statistics (like Youden's index and others) that are often used for that. These can be obtained using the ROCPLOT macro (or in PROC LOGISTIC if you have a recent version of SAS Viya).  However, note that the unique predicted probabilities, which are the cutoffs used for the ROC curve, are computed using ALL of the predictor values. So, it is not possible to talk about cutoffs on just your LAB predictor with your model. Each cutoff is determined by both LAB and VISIT using your model. If you remove VISIT from the MODEL statement then you can add the OUTROC= option in the MODEL statement in your PROC LOGISTIC step and then merge that data set together with your FITDAT data set.

proc sort data=fitdat out=fitdat2(rename=(predprob=_PROB_)); by predprob; run;
proc sort data=or out=or2; by _prob_; run;
data or3; merge fitdat2 or2; by _prob_; run;

This allows you to have a data set (OR3) that shows the LAB value corresponding to each cutoff.  That data set also has the cell counts of the 2x2 table associated with each cutpoint and the sensitivity and 1-specificity statistics. Using those, you can easily compute the other statistics you want as shown in this note on computing various 2x2 table statistics.

View solution in original post

5 REPLIES 5
Merdock
Quartz | Level 8

@Ksharp, thanks for providing this link, though it looks like this example is specifically for cross sectional data, whereas my data is longitudinal (repeated measures on same participants over time). So I'm not sure how/if this can be adapted for my case where the status can change from one visit to another (for example, ID#a001 has STATUS=0 at visits 1, 2, 4 but STATUS=1 at visit 3 and so on). Maybe this would actually be more of a question for the Statistical Procedures group but I'm guess I'm a bit confused about whether I'm going down the right path or what is the best way to use ROC analysis to assess the diagnostic performance of my LAB predictor...

Ksharp
Super User

Yes. Better post it at Statistical Forum.

https://communities.sas.com/t5/Statistical-Procedures/bd-p/statistical_procedures

 

Maybe @StatDave  could give you a hand.

 

From my thought, I think you should use predicted value to make a 2x2 contingency table to get these estimators.

StatDave
SAS Super FREQ

If your final goal is to find an optimal cutoff, then note that there are statistics (like Youden's index and others) that are often used for that. These can be obtained using the ROCPLOT macro (or in PROC LOGISTIC if you have a recent version of SAS Viya).  However, note that the unique predicted probabilities, which are the cutoffs used for the ROC curve, are computed using ALL of the predictor values. So, it is not possible to talk about cutoffs on just your LAB predictor with your model. Each cutoff is determined by both LAB and VISIT using your model. If you remove VISIT from the MODEL statement then you can add the OUTROC= option in the MODEL statement in your PROC LOGISTIC step and then merge that data set together with your FITDAT data set.

proc sort data=fitdat out=fitdat2(rename=(predprob=_PROB_)); by predprob; run;
proc sort data=or out=or2; by _prob_; run;
data or3; merge fitdat2 or2; by _prob_; run;

This allows you to have a data set (OR3) that shows the LAB value corresponding to each cutoff.  That data set also has the cell counts of the 2x2 table associated with each cutpoint and the sensitivity and 1-specificity statistics. Using those, you can easily compute the other statistics you want as shown in this note on computing various 2x2 table statistics.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 809 views
  • 6 likes
  • 4 in conversation