Re: GLIMMIX and intraclass correlation coefficient interpretation?

PharmlyDoc · Posted 03-04-2022 01:40 PM

I am trying to understand how to interpret the intraclass correlation coefficient, which I am calculating from proc glimmix.

• For this example I am using the Hospital, Doctor, Patient (HDP) dataset from UCLA. https://stats.oarc.ucla.edu/r/codefragments/mesimulation/

• I will be modeling for the outcome of cancer remission and my fixed predictors will be IL6, CancerStage, Age, and FamilyHx, and patients are nested within hospitals (HID)

• I will assume that I do not have doctor ID's (DID), so I will not be nesting by doctors, nor do I have variables describing the characteristics of hospitals

• I want to know the odds of remission by the specified predictors, while also accounting for differences in remission between or within hospitals - which is where the intraclass correlation coefficient comes in.

• These resource states to calculate the ICC from the "empty, unconditional model with no predictors", only the random intercept: https://support.sas.com/resources/papers/proceedings15/3430-2015.pdf , https://www.lesahoffman.com/CLDP945/CLDP945_Example07a_Binary_Clustered.pdf , https://www.lexjansen.com/sesug/2015/173_Final_PDF.pdf , https://support.sas.com/resources/papers/proceedings12/350-2012.pdf – Do I calculate the ICC from the random intercept model, or from the full model? How do I interpret the ICC?

With the random intercept model, I calculate an ICC of .093. Does this mean that 9% of the variability in going into remission is accounted for by the hospitals? --> ICC = Covariance Parameter Estimate/(Covariance Parameter Estimates + 3.29) -->
ICC = 0.3366/(0.3366+3.29) = .093

filename csvFile url "https://stats.idre.ucla.edu/stat/data/hdp.csv" termstr=crlf; 
 
proc import datafile=csvFile  
dbms=csv  
out=hdp ; 
getnames=yes; 
guessingrows=3000; 
run; 

proc sql;
select
distinct(FamilyHx)
from hdp;
quit;


proc glimmix data=hdp PLOTS=(ALL ODDSRATIO(logbase=10 TYPE=HORIZONTALSTAT)) namelen=30 ; 
class HID  ; 
   model remission(event='1') = 
/ distribution=binary link=logit cl  oddsratio solution ddfm=betwithin; 
   random intercept / subject=HID solution CL  ;
COVTEST /WALD; 
run; 
quit; 
/* Covariance Parameter Estimates for HID = 0.3366  
ICC = Covariance Parameter Estimate/(Covariance Parameter Estimates + 3.29) 
ICC = 0.3366/(0.3366+3.29)  = .093
9% of the variability in going into remission is accounted for by the hospitals? */

With the full model, I calculate an ICC of 10%.

proc glimmix data=hdp PLOTS=(ALL ODDSRATIO(logbase=10 TYPE=HORIZONTALSTAT)) namelen=30 ; 
class HID CancerStage(ref='I') FamilyHx(ref="no"); 
   model remission(event='1') = IL6 CancerStage Age FamilyHx
/ distribution=binary link=logit cl  oddsratio solution ddfm=betwithin; 
   random intercept / subject=HID solution CL ; 
COVTEST / WALD;
run; 
quit; 
/* Covariance Parameter Estimates for HID = 0.3721 
ICC = Covariance Parameter Estimate/(Covariance Parameter Estimates + 3.29) 
ICC = 0.3721/(0.3721+3.29)  = .1016
10% of the variability in going into remission is accounted for by the hospitals? */

Thank you.

sbxkoenk · Posted 03-05-2022 05:30 AM

Hello,

I am sorry for not answering your question directly, but I just wanted to make you aware about this :

S A S S A M P L E L I B R A R Y
Intraclass Correlations

https://support.sas.com/documentation/onlinedoc/stat/ex_code/131/intracc.html

Also, please visit https://www.lexjansen.com/

Then enter 'intraclass correlation' as search terms and hit the magnifying glass.

Good luck,

Koen

PharmlyDoc · Posted 03-05-2022 11:47 AM

@sbxkoenk wrote:

Hello,

I am sorry for not answering your question directly, but I just wanted to make you aware about this :

S A S S A M P L E L I B R A R Y
Intraclass Correlations

https://support.sas.com/documentation/onlinedoc/stat/ex_code/131/intracc.html

Also, please visit https://www.lexjansen.com/

Then enter 'intraclass correlation' as search terms and hit the magnifying glass.

Good luck,

Koen

Thanks for the feedback. The example from the SAS Sample Library is unfortunately not helpful because it calculates inter-rater reliability. I searched lexjansen.com for "intraclass correlation" AND "3.29" and for "intraclass correlation" AND "glimmix", and the pertinent responses are listed in my original post. I'm now looking for insight from other experts.

Interestingly, a 2018 SAS paper states:

"How Do You Compute an Intra-Class Correlation Coefficient (ICC) for a Non-Normal Response Model in PROC GLIMMIX (Binary, Multinomial, and so on)? SAS is not aware of a generally accepted method for calculating an ICC in a logistic model, mainly because there is no concept of a residual variance in a logistic regression model. You can form your own ratios of the variance components in your model, but SAS does not endorse such an ICC calculation for a non-normal response model." - https://www.sas.com/content/dam/SAS/support/en/sas-global-forum-proceedings/2018/2179-2018.pdf

I wonder if SAS has since changed their position on endorsing an ICC calculation for a binary response response model? @jiltao

SteveDenham · Posted 03-08-2022 08:46 AM

I doubt that there has been a change in the position. The variance component ratios can be calculated, but the are NOT correlations. In particular, there is no "residual" variance for the binomial distribution as the variance is a function of the point estimate, and changes for each level of a categorical predictor or each point along a continuous variable.

SteveDenham

PharmlyDoc · Posted 03-08-2022 10:02 AM

Hi @SteveDenham. Is the residual variance for a response variable with a logistic distribution in a mixed model not π²/3 or 3.29? Thanks

"In HGLMs, there is assumed to be no error at level-1, therefore, a slight modification is needed to calculate the ICC. This modification assumes the dichotomous outcome comes from an unknown latent continuous variable with a level-1 residual that follows a logistic distribution with a mean of 0 and a variance of 3.29 (Snijders & Bosker, 1999 as cited in O’Connell et al., 2008). Therefore, 3.29 will be used as our level-1 error variance in calculating the ICC." - https://support.sas.com/resources/papers/proceedings15/3430-2015.pdf

STATA documentation states: "In a mixed-effects logistic and ordered logistic regression, errors are assumed to be logistic with mean 0 and variance γ. Random intercepts are assumed to be normally distributed with mean 0 and variance σ²2 and to be independent of error terms. The intraclass correlation for this model is ρ = Corr(y*ij, y*i'j ) = σ²2 /γ + σ²2 where γ = σ²1for a mixed-effects linear regression, γ = 1 for a mixed-effects probit and ordered probit regression, γ = π 2/3 for a mixed-effects logistic and ordered logistic regression, and γ = π 2/6 for a mixed-effects complementary log–log regression. The intraclass correlation corresponds to the correlation between the latent responses i and i' from the same group j." - https://www.stata.com/manuals/meestaticc.pdf

The logit "model also admits a latent variable formulation, except that this time the individual error terms eij are assumed to follow standard logistic rather than standard normal distributions. To be precise, we assume that Yij = 1 if, and only if, Y ∗ ij > 0, just as before. We further assume that Y ∗ ij follows the linear mixed model in (1) with mean η and ui ∼ N(0, σ2u), but eij now has a logistic distribution with mean 0 and variance σ2e. Just as before, we note that the scale of the latent variable is not identified. For convenience, we take eij to have a standard logistic distribution, which happens to have variance π2/3 or approximately 3.29." - https://ageconsearch.umn.edu/record/116030/files/sjart_st0031.pdf

SteveDenham · Posted 03-08-2022 10:36 AM

Well, I learned something there. The variance will converge in distribution to that value, but the estimate could be about anything with small to medium datasets. And note that the mean converges to zero. Now, I believe that is a good assumption for the null hypothesis of no differences, but the realization, given the data at hand, has for each level a non-zero mean, and if this is a multifactor model, the estimates of the level means won't necessarily sum to zero, if there is any imbalance in the data That is one thing to consider.

The other thing that bothers me a bit is that if I plug into the formula for the sampling variance of a binomial (equivalent to the null model), it looks like n * p hat * (i - p hat). If I plug in your null model value (0.3366) and assume a p hat value of 0.5 (a best case estimate) and solve for n, I get an n of about 13. Does that seem correct for your data?

SteveDenham

GLIMMIX and intraclass correlation coefficient interpretation?