turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- Analytics
- /
- Stat Procs
- /
- Hierarchical logistic regression: choice of a mode...

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

05-10-2017 04:35 PM

Good evening everyone,

I am new to hierarchical logistic regression and would like to have your help on a practical problem.

I need to analyse potential predictors of early recanalisation (dependent variable, binary) in an observational study of 352 patients with acute stroke. Patients had brain imaging and thrombolysis therapy at the nearest stroke center (nearest_center, n=15) and all patients were subsequently transferred to a comprehensive stroke center (comprehensive_center, n=4) to receive endovascular therapy. One important point: sometimes the nearest center was a comprehensive stroke center and secondary transfer was obviously not needed.

I consider using a mixed model to take into account potential heterogeneity across centers.

When I run a three-level hierarchical model with proc glimmix (level-1 = patient; level-2= nearest center; level-3 = comprehensive center), I have no convergence problem. Here is the code that I used (random intercept model, fixed slopes). Thrombus_length is a continuous variable and fvh is a binary variable:

```
proc glimmix method=laplace ;
class comprehensive_center nearest_center fvh (ref='0') ;
model recanalisation_complete (event='1') = thrombus_length fvh / dist=binary link=logit solution oddsratio ;
random intercept / subject=comprehensive_center ;
random intercept / subject=nearest_center(comprehensive_center) ;
run;
```

My first question is: is it correct to use such a model since for some patients the comprehensive center and nearest center are the same ?

I also considered using a simpler model, with two hierarchical levels (level-1=patient; level-2= nearest center):

```
proc glimmix method=laplace ;
class nearest_center fvh (ref='0') ;
model recanalisation_complete (event='1') = thrombus_length fvh / dist=binary link=logit solution oddsratio ;
random intercept / subject=nearest_center ;
run;
```

However, in this situation I get the following message: "Warning: obtaining minimum var. quadratic unbiased estimates as starting values for the covariance parameters failed." I tried many options proc glimmix offers for controlling the optimization process (http://support.sas.com/resources/papers/proceedings12/332-2012.pdf), without success. The culprit seems to be the thrombus_length variable, because I get the above error message only when this variable is in the model (even if it is the only independant variable entered).

My second question is: could this error message reflect that the number of patients (or the number of patients who recanalize) in each nearest_center is too small to reasonably allow the use of a hierarchical logistic model ? Is there a rule of thumb regarding this point ? The number of patients in each nearest_center ranges from 6 to 44, and the number of patients who recanalize ranges from 0 to 11 (total number of patients who recanalize: 61).

Many thanks for your help.

Guillaume

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Undertow

05-11-2017 12:44 PM - edited 05-11-2017 01:00 PM

Hey Guillaume,

I work with hospital cost, charge, and length of stay data, using PRC GLIMMIX. Hopefully I can provide some insight. So, from what i can tell, you've set up your models pretty well! Unfortunately, (as i have found) no matter what you do, there sometimes can be hangups with the modeling process, given the specifics of your data.

**Here are a few specifics about my data/models - so that you can make a comparison a little easier. **

- All of the data is continuous (Gamma dist), or Count (Negative Binomial dist)
- I'm using a 2 - level model

**Here are the things that helped me: (most all this is from the SAS help docs for GLIMMIX)**

- I ran into the "Unable to...quadratic variance estimates" problem as well
**Using the**:*GLIMMIX*statement- Increase the number of iterations within the
**INITGLM**setting by setting**INITITER**to something high. (I've found the compute time and resource costs are close to nothing for this option). Based on stuff I've read, you want to be cautious when using the__*Note*__method with binary data, as with too few samples per cluster, or not enough clusters, you may get biased results; however, the small sample size bias diminishes quickly given that you have enough samples or observations within samples, according so SAS. The alternative is to use the*Laplace*method, and if it ends up only using "**Quad****1 point**" in the output statistics, then you'll probably be just fine using. You can always manually set**Laplace****Quad(**and compare the results to those of**Qpoints=7)**.**Laplace***Also, prepare yourself, this is going to take significantly longer...*

- If you think that small sample/obs per sample is an issue, you can set the
**EMPERICAL**option to "**MBN**".

- Increase the number of iterations within the
**Using the MODEL statement***:*- You probably won't have to do this; however, I've found that setting the
option to "**DDFM****BW**" generally speeds things up quite a bit. - I've also noticed that in my models that scaling any continuous variables seems to help quite a bit. So you might consider doing that.

- You probably won't have to do this; however, I've found that setting the
**Using the RANDOM statement**:- You probably won't have to do this, either; however, if the covariance estimates you get seem a little off, i would set the
option to "**TYPE****Chol**" for the covariance structure. As I've found that it makes things a bit more stable (as the help guide recommends over the "**UN**" method. Although, only having a random int, it shouldn't be that big of an issue). - I would output the
**Random Intercept Solution**just to make sure that you aren't ending up with ridiculous estimates/confidence intervals. Also, they might be useful for comparison later on. You set these with the**S**and**CL**options.

- You probably won't have to do this, either; however, if the covariance estimates you get seem a little off, i would set the
**Using the****PARMS****statement**:- Try examining the covariance parameters from a simpler model, and putting those as "Starting values" for the 2 - level model, by using the "
" statement.**Parms** - If that doesn't work,
with something like this (change the values to a range that you think is appropriate for your data). Also, you have to name them (*make your own starting values***CovP1**,**CovP2**), as that is what SAS expects to see when you pass them in using the**Parms**statement. `data baseParms; CovP2 = .01; /*Use your own value here to start the loop, this probably won't be useful for your data*/ do while (CovP2 < .1); CovP1 = .01; /*Use your own value here to start the second loop, again it may not be useful for your data*/ do while (CovP1 <= .1); output; CovP1 = CovP1 + .01; end; CovP2 = CovP2 + .01; end; run;`

- Try examining the covariance parameters from a simpler model, and putting those as "Starting values" for the 2 - level model, by using the "
**Using the**""**NLOPTIONS****statement**:- Jack up the maxitier and maxfunc way up there:
- Given that you are working with binary data, i think that i've read that the suggested optimization method is
. Although, if I'm wrong with that, i hope people chime in.**NRRIDG** - If you run into convergence issues (which seems to be an issue sometimes), try setting
to something greater than the default of**Gconv****1e-8**. Honestly, i would set this to**1e-4**so that the model runs faster...then when you know the model works, set that back to**1e-8**

Here is what i might try:

`ODS SELECT CovParms; proc glimmix method=laplace ; class nearest_center fvh (ref='0') ; model recanalisation_complete (event='1') = / dist=binary link=logit solution oddsratio ; random intercept / subject=nearest_center ; ODS OUTPUT CovParms=Parms; run; quit; ODS SELECT ALL;`

ODS SELECT FitStatistics CovParms Test3 OddsRatios; proc glimmix method=laplace inititer=100 empirical=mbn; class nearest_center fvh (ref='0') ; model recanalisation_complete (event='1') = thrombus_length fvh

/dist=binary link=logit solution oddsratio; parms /ParmsData=Parms; /*These are the values from above. Optionally use the "baseparms" file from above if the simpiler model doesn't work*/ random intercept / subject=nearest_center s cl;

nloptions

tech=nrridg

gconv=1e-6

maxiter=10000 maxfunc=10000;

ods output SolutionR=RandomEffects;

title "Model Results"; run;

quit;

ODS SELECT All;

`Hopefully that helps!! And if i've given any inaccurate advice, i hope that there are some corrections!!`

`(also my code didn't really output well in the box, but just pretend i structured it super well)`

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to lkeyes

05-11-2017 05:22 PM

Dear lkeyes,

Many thanks for your detailed answer, which was very helpful, even though I am disappointed by the results.

When I run the 2-level unconditional model (without predictors), I have the following covariance parameter estimates: intercept estimate 0.12, StdErr 0.13. Values for the estimate and StdErr of each center's intercept seem to be correctly displayed In the "solution for random effects" table. DF=409.

Thanks to the addition of the "parms /ParmsData= Parms;" statement, I am now able to obtain convergence of the algorithm for the 2-level model containing only a random intercept and the continuous thrombus_length variable (fixed slope). However, in the "solution for random effects", the estimate and StdErr of each center's intercept are now equal to zero.

I also realized that in the 3-level model described in my earlier post, the estimate and StdErr of each center's intercept (random effects) are also equal to zero.

Every time the thrombus_length variable is entered in a hierarchical logistic model, this problem occurs. Rescaling the variable [0 , 1] or making it categorical (tertiles) did not change the situation. This is the only variable leading to this problem. Unfortunately, it is the most important predictor and therefore it cannot be simply ruled out of any model.

Could you please confirm that both models are invalid ? Or equivalent to a simple (non-hierarchical) logistic regression ? Is this due to the fact that the number of events (recanalisation=yes) per center is too small ?

Thank you for your response.

Guillaume