Good evening everyone,
I am new to hierarchical logistic regression and would like to have your help on a practical problem.
I need to analyse potential predictors of early recanalisation (dependent variable, binary) in an observational study of 352 patients with acute stroke. Patients had brain imaging and thrombolysis therapy at the nearest stroke center (nearest_center, n=15) and all patients were subsequently transferred to a comprehensive stroke center (comprehensive_center, n=4) to receive endovascular therapy. One important point: sometimes the nearest center was a comprehensive stroke center and secondary transfer was obviously not needed.
I consider using a mixed model to take into account potential heterogeneity across centers.
When I run a three-level hierarchical model with proc glimmix (level-1 = patient; level-2= nearest center; level-3 = comprehensive center), I have no convergence problem. Here is the code that I used (random intercept model, fixed slopes). Thrombus_length is a continuous variable and fvh is a binary variable:
proc glimmix method=laplace ;
class comprehensive_center nearest_center fvh (ref='0') ;
model recanalisation_complete (event='1') = thrombus_length fvh / dist=binary link=logit solution oddsratio ;
random intercept / subject=comprehensive_center ;
random intercept / subject=nearest_center(comprehensive_center) ;
run;
My first question is: is it correct to use such a model since for some patients the comprehensive center and nearest center are the same ?
I also considered using a simpler model, with two hierarchical levels (level-1=patient; level-2= nearest center):
proc glimmix method=laplace ;
class nearest_center fvh (ref='0') ;
model recanalisation_complete (event='1') = thrombus_length fvh / dist=binary link=logit solution oddsratio ;
random intercept / subject=nearest_center ;
run;
However, in this situation I get the following message: "Warning: obtaining minimum var. quadratic unbiased estimates as starting values for the covariance parameters failed." I tried many options proc glimmix offers for controlling the optimization process (http://support.sas.com/resources/papers/proceedings12/332-2012.pdf), without success. The culprit seems to be the thrombus_length variable, because I get the above error message only when this variable is in the model (even if it is the only independant variable entered).
My second question is: could this error message reflect that the number of patients (or the number of patients who recanalize) in each nearest_center is too small to reasonably allow the use of a hierarchical logistic model ? Is there a rule of thumb regarding this point ? The number of patients in each nearest_center ranges from 6 to 44, and the number of patients who recanalize ranges from 0 to 11 (total number of patients who recanalize: 61).
Many thanks for your help.
Guillaume
Hey Guillaume,
I work with hospital cost, charge, and length of stay data, using PRC GLIMMIX. Hopefully I can provide some insight. So, from what i can tell, you've set up your models pretty well! Unfortunately, (as i have found) no matter what you do, there sometimes can be hangups with the modeling process, given the specifics of your data.
Here are a few specifics about my data/models - so that you can make a comparison a little easier.
Here are the things that helped me: (most all this is from the SAS help docs for GLIMMIX)
data baseParms;
CovP2 = .01; /*Use your own value here to start the loop, this probably won't be useful for your data*/
do while (CovP2 < .1);
CovP1 = .01; /*Use your own value here to start the second loop, again it may not be useful for your data*/
do while (CovP1 <= .1);
output;
CovP1 = CovP1 + .01;
end;
CovP2 = CovP2 + .01;
end;
run;
Here is what i might try:
ODS SELECT CovParms;
proc glimmix method=laplace ;
class nearest_center fvh (ref='0') ;
model recanalisation_complete (event='1') = / dist=binary link=logit solution oddsratio ;
random intercept / subject=nearest_center ;
ODS OUTPUT CovParms=Parms;
run;
quit;
ODS SELECT ALL;
ODS SELECT FitStatistics CovParms Test3 OddsRatios;
proc glimmix method=laplace inititer=100 empirical=mbn;
class nearest_center fvh (ref='0') ;
model recanalisation_complete (event='1') = thrombus_length fvh
/dist=binary link=logit solution oddsratio;
parms /ParmsData=Parms; /*These are the values from above. Optionally use the "baseparms" file from above if the simpiler model doesn't work*/
random intercept / subject=nearest_center s cl;
nloptions
tech=nrridg
gconv=1e-6
maxiter=10000 maxfunc=10000;
ods output SolutionR=RandomEffects;
title "Model Results";
run;
quit;
ODS SELECT All;
Hopefully that helps!! And if i've given any inaccurate advice, i hope that there are some corrections!!
(also my code didn't really output well in the box, but just pretend i structured it super well)
Dear lkeyes,
Many thanks for your detailed answer, which was very helpful, even though I am disappointed by the results.
When I run the 2-level unconditional model (without predictors), I have the following covariance parameter estimates: intercept estimate 0.12, StdErr 0.13. Values for the estimate and StdErr of each center's intercept seem to be correctly displayed In the "solution for random effects" table. DF=409.
Thanks to the addition of the "parms /ParmsData= Parms;" statement, I am now able to obtain convergence of the algorithm for the 2-level model containing only a random intercept and the continuous thrombus_length variable (fixed slope). However, in the "solution for random effects", the estimate and StdErr of each center's intercept are now equal to zero.
I also realized that in the 3-level model described in my earlier post, the estimate and StdErr of each center's intercept (random effects) are also equal to zero.
Every time the thrombus_length variable is entered in a hierarchical logistic model, this problem occurs. Rescaling the variable [0 , 1] or making it categorical (tertiles) did not change the situation. This is the only variable leading to this problem. Unfortunately, it is the most important predictor and therefore it cannot be simply ruled out of any model.
Could you please confirm that both models are invalid ? Or equivalent to a simple (non-hierarchical) logistic regression ? Is this due to the fact that the number of events (recanalisation=yes) per center is too small ?
Thank you for your response.
Guillaume
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 16. Read more here about why you should contribute and what is in it for you!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.