I have a dataset (data1) with assessments taken at different visits: baseline (visit=1), Day 8 (visit=2), Day 30 (visit=3), Day 60 (visit=4) and Day 180 (visit=5) on an ordinal variable with 5 levels (1=normal, 2=mild, 3=moderate, 4=severe), and a separate dataset (data2) with repeated measures for a continuous variable (biomarker), taken at the same visits (1, 2, 3, 4, 5).
One of the exploratory objectives of my exercise problem (I’m trying to teach myself GLMM) asks to investigate the correlation between the biomarker (as predictor) and the ordinal variable (as outcome). Since this is only an exploratory objective, i.e., the study was not specifically designed for this, not all participants in the study will have assessments performed at all those visits.
Since this is a longitudinal study, with repeated measurements taken on the same subjects, I am thinking of exploring the correlation between the continuous predictor and categorical outcome from baseline to Day 180 by using repeated measures logistic regression, implemented via PROC GLIMMIX.
In preparation for fitting the model, this is what I have done so far:
Outcome variable: Disease status (outcome= 0 means no disease present; outcome=1 means disease)
Independent variables: Biomarker (biom) but also visit as fixed-effect factor since we have repeated measurements
Dataset:
data final;
input id $ visit outcome biom;
datalines;
001 1 0	59.7
001 2 0	78.4
001 4 0	75.2
001 5 1	80.7
002 2 0	64.6
002 5 0	389
003 1 0	618
003 2 0	469
003 3 0	478
004 1 1	404
004 2 0	47.3
004 3 1	64.5
004 4 0	88.8
004 5 0	86.7
005 1 0	88.3
006 1 0	234
007 1 0	245
007 2 0	243
007 3 0	237
007 5 0	226
008 1 0	22.2
008 2 0	25.5
008 5 1	35.5
009 3 0	35.3
009 5 0	30.3
010 1 0	134
010 5 0	167
011 4 0	146
011 5 0	135
012 1 0	140
012 4 0	74.6
012 5 0	72.9
013 1 0	79.1
013 3 0	75.6
013 4 0	68.9
014 2 1	291
014 3 0	21.3
015 1 0	17
015 5 1	15.6
016 1 0	16.1
016 2 0	13.9
016 5 0	99.9
017 1 0	105
017 3 0	96.2
017 4 0	102
017 5 0	89.2
018 3 1	25.9
019 2 0	27.3
019 3 0	26.2
020 2 0	26.2
020 5 0	28.9
021 1 1	74.2
021 3 0	75.1
021 4 0	60.4
021 5 0	62.2
022 1 0	61.4
023 1 0	12.7
024 2 0	12.1
024 3 0	13.4
025 1 0	11.6
025 5 0	11.5
026 1 0	45.9
026 5 0	47.2
027 1 0	39
027 2 0	38.7
027 3 1	42.7
027 4 0	18.4
027 5 0	15.3
028 1 0	15.9
028 2 0	16.1
028 4 0	15.8
029 1 0	57.8
029 2 1	86.7
029 3 1	88.3
029 5 0	234
030 1 0	245
030 3 0	243
030 5 0	237
031 1 0	226
031 2 0	22.2
031 4 0	18.4
032 4 0	15.3
032 5 0	15.9
033 1 0	16.1
033 2 0	78.4
034 2 0	75.2
034 3 0	80.7
035 1 0	64.6
035 2 0	389
035 3 0	618
035 4 0	469
036 1 0	478
037 1 0	152
038 2 0	148
038 3 0	29.12
039 2 0	421
040 1 0	520
040 2 0	478
040 3 0	18.4
041 2 0	15.3
041 4 0	15.9
042 1 0	16.1
043 1 0	78.4
044 1 0	325
044 2 0	478
044 3 0	452
045 2 0	25.8
045 4 0	15.9
045 5 0	16.1
046 1 0	78.4
046 4 0	16.8
; 
run;
data final2;
input id $ visit outcome biom;
datalines;
001 1 0	59.7
001 2 0	78.4
001 4 0	75.2
001 5 1	80.7
004 1 1	404
004 2 0	47.3
004 3 1	64.5
004 4 0	88.8
004 5 0	86.7
007 1 0	245
007 2 0	243
007 3 0	237
007 5 0	226
008 1 0	22.2
008 2 0	25.5
008 5 1	35.5
010 1 0	134
010 5 0	167
012 1 0	140
012 4 0	74.6
012 5 0	72.9
015 1 0	17
015 5 1	15.6
016 1 0	16.1
016 2 0	13.9
016 5 0	99.9
017 1 0	105
017 3 0	96.2
017 4 0	102
017 5 0	89.2
021 1 1	74.2
021 3 0	75.1
021 4 0	60.4
021 5 0	62.2
025 1 0	11.6
025 5 0	11.5
026 1 0	45.9
026 5 0	47.2
027 1 0	39
027 2 0	38.7
027 3 1	42.7
027 4 0	18.4
027 5 0	15.3
029 1 0	57.8
029 2 1	86.7
029 3 1	88.3
029 5 0	234
030 1 0	245
030 3 0	243
030 5 0	237
; 
Run;
Model:
proc glimmix data=final2 method=quad(qpoints=50) noclprint;
class id visit;
model outcome = visit biom biom*visit / noint dist=binomial link=logit solution;
random intercept / subject=id;
output out=fitglmm pred(ilink noblup)=pred;
run;My questions are:
Thank you so much for any help and guidance.
Let's go through these by number:
My questions are:
SteveDenham
Let's go through these by number:
My questions are:
SteveDenham
Hi Steve,
Thank you so much for your answers, that's really helpful!
To address your question about collapsing the mutinomial, I was also planning on doing a ROC analysis to estimate the predictive ability of the biomarker for the detection of disease, where participants with “normal” and “mildly decreased” values of the ordinal outcome variable would be categorized as belonging to the “non-disease” group, while participants with “moderately decreased” and “severely decreased” values would be considered part of the “disease" group. So I would like to assess the diagnostic performance of the biomarker for several cut-offs by computing accuracy, sensitivity, specificity, positive predictive value, negative predictive value and likelihood ratios. This ROC analysis is the reason why I dichotomized the multinomial response variable. I figured that I can use proc glimmix to explore the relationship between outcome and biomarker from baseline to visit 5, and then take the estimated probabilities of positivity for each observation based on the model use them in PROC LOGISTIC to build the ROC curve, though not sure how to programmatically obtain the positive predictive value, negative predictive value and likelihood ratios? Hopefully this makes sense.
Regarding your comment to my question 4 below, about imposing a covariance structure on visit, can I do that by adding another random visit statement, like shown below? I read that proc glimmix does not allow for a repeated statement like proc mixed does but instead we can equivalently use a random statement with the residual option:
proc glimmix data=final2 method=quad(qpoints=50) noclprint;
class id visit;
model outcome= visit biom biom*visit / noint dist=binomial link=logit solution;
random intercept/ subject=patid;
random visit/ subject=patid type=UN residual;
output out=fitglmm pred(ilink noblup)=pred;
run;That code will not run, as method=quad does not support a RANDOM residual structure. Two choices: remove the residual option from the new random statement, or leave it in and drop back to the default pseudo-likelihood method. The first will coincide with more of the published literature, while the second may be more appropriate in controlling Type I error (see Stroup & Claassen https://econpapers.repec.org/RePEc:spr:jagbes:v:25:y:2020:i:4:d:10.1007_s13253-020-00402-6 ) (paywall. I have a copy that I seem to have been able to access through my ASA membership)
SteveDenham
Hi Steve,
Thank you for the suggestions. I tried the following fitting the model on the dataset above and unfortunately there seems to be an issue, as the model does not converge. Could this be due to small sample size (only 15 participants with visit 1 and 5 so total of 30 observations, and only 5 observations with outcome=1 at either visit 1 or 5), or something else?
Additionally, I'm also getting a message that the model does not have an intercept, is that because I included the noint option? That message does go away if I remove the noint. That said, does it even make sense or is it necessary to have a random intercept statement here?
I know that this would probably not cause the issues mentioned but wouldn't it also be better to use dist=binary instead of binomial since my dependent variable is 0/1 as opposed to events/trial?
Sorry for all these questions, I'm completely new to longitudinal/repeated measures analysis and just trying to get a better understanding of things.
proc glimmix data=final2 method=RSPL noclprint;
class id visit;
model outcome = visit biom biom*visit / noint dist=binomial link=logit solution;
random intercept / subject=id;
random visit/ subject=id type=UN residual;
output out=fitglmm pred(ilink noblup)=pred;
run;You might have an overspecified model.
Might want to try taking out the RANDOM intercept statement, or using a different covariance structure for the R-side, such as type=AR(1).
Hi Jiltao, thanks for your suggestions. I took out the interaction term (visit*biom), in addition to also trying both the AR(1) covariance structure, as well as removing the random intercept statement but it still doesn’t converge.
Here is what I'm getting in each case:
When I delete the random intercept and use type=AR(1):
Versus when I delete random intercept but still use type=UN:
Could this be due maybe to lack of variability in the response variable which would result in a lack of predictive capability, since I only have 9 observations with outcome=1 vs 41 obs with outcome=0?
The main issue I see in those outputs is that iterations stop at step 19 in the first and that the pseudo-likelihood update fails in outer iteration 6 in the second. Additionally in the first, the objective function is not proceeding smoothly, with a jump-reset at iteration 14.
Things to try (no guarantee): Add an NLOPTIONS statement, changing the technique to ridged Newton-Raphson [tech=NRRIDG] and allowing the number of iterations to increase [MAXITER=1000].
Change the method to Laplace. This would require removing the residual option from the RANDOM statement.
Adjust the convergence criterion (see the paper here https://www.lexjansen.com/mwsug/2012/SA/MWSUG-2012-SA15.pdf ) for some examples.
SteveDenham
Hi Steve, I tried adding the options you suggested and it looks like that solved the convergence issue though now I’m getting another message saying the G matrix is not positive definite. From what I've been reading one of the possible causes for this might be that there is not enough variation in the response (which would be the case for my data). I also tried removing the random intercept statement and that took care of message about G matrix not being PD, but instead I started getting a message in the output table saying something about the convergence status being indeterminate. Any further ideas or suggestions? I'm starting to think that maybe a GLMM is not that well suited given the scarcity of my data and I should just use good old summary statistics instead of attempting to do any modeling.
That looks much like what I found. You might try fitting just an R-side GLM in GLIMMIX or a GEE using GENMOD or GEE - no RANDOM statement. I think part of the problem with the indeterminate convergence comes from overspecifying the R matrix. Rather than UN, try CS. If you switch to a GEE, those two PROCs call that an exchangeable structure.
SteveDenham
Hi Steve,
You're right - I tried various options and it converges if I comment out the random intercept statement, change the covariance structure to CS for the R matrix and add that NLOPTIONS statement at the end. I might also try playing around with a GEE to see the difference but ultimately I think this is definitely progress. How meaningful the results might be or how well this model fits my data is a different story but at least I got it to converge :). Thank you sooo much for all your step-by-step help, I truly appreciate it!
proc glimmix data=final2 noclprint;
class id visit (ref="1");
model outcome(event="1")= biom visit/ / dist=binary link=logit solution;
random visit/ subject=id residual type=cs;
output out=fitglmm pred(ilink noblup)=pred;
NLOPTIONS tech=NRRIDG Maxiter=1000;
run;It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.
