BookmarkSubscribeRSS Feed
PharmlyDoc
Quartz | Level 8

I am trying to understand how to interpret the intraclass correlation coefficient, which I am calculating from proc glimmix. 

• For this example I am using the Hospital, Doctor, Patient (HDP) dataset from UCLA. https://stats.oarc.ucla.edu/r/codefragments/mesimulation/ 

• I will be modeling for the outcome of cancer remission and my fixed predictors will be IL6, CancerStage, Age, and FamilyHx, and patients are nested within hospitals (HID)

• I will assume that I do not have doctor ID's (DID), so I will not be nesting by doctors, nor do I have variables describing the characteristics of hospitals

• I want to know the odds of remission by the specified predictors, while also accounting for differences in remission between or within hospitals -  which is where the intraclass correlation coefficient comes in. 

• These resource states to calculate the ICC from the "empty, unconditional model with no predictors", only the random intercept: https://support.sas.com/resources/papers/proceedings15/3430-2015.pdf  , https://www.lesahoffman.com/CLDP945/CLDP945_Example07a_Binary_Clustered.pdf , https://www.lexjansen.com/sesug/2015/173_Final_PDF.pdf , https://support.sas.com/resources/papers/proceedings12/350-2012.pdf   – Do I calculate the ICC from the random intercept model, or from the full model? How do I interpret the ICC?

 

With the random intercept model, I calculate an ICC of .093. Does this mean that 9% of the variability in going into remission is accounted for by the hospitals? --> ICC = Covariance Parameter Estimate/(Covariance Parameter Estimates + 3.29) -->
ICC = 0.3366/(0.3366+3.29) = .093

 

filename csvFile url "https://stats.idre.ucla.edu/stat/data/hdp.csv" termstr=crlf; 
 
proc import datafile=csvFile  
dbms=csv  
out=hdp ; 
getnames=yes; 
guessingrows=3000; 
run; 

proc sql;
select
distinct(FamilyHx)
from hdp;
quit;


proc glimmix data=hdp PLOTS=(ALL ODDSRATIO(logbase=10 TYPE=HORIZONTALSTAT)) namelen=30 ; 
class HID  ; 
   model remission(event='1') = 
/ distribution=binary link=logit cl  oddsratio solution ddfm=betwithin; 
   random intercept / subject=HID solution CL  ;
COVTEST /WALD; 
run; 
quit; 
/* Covariance Parameter Estimates for HID = 0.3366  
ICC = Covariance Parameter Estimate/(Covariance Parameter Estimates + 3.29) 
ICC = 0.3366/(0.3366+3.29)  = .093
9% of the variability in going into remission is accounted for by the hospitals? */

 

With the full model, I calculate an ICC of 10%. 

proc glimmix data=hdp PLOTS=(ALL ODDSRATIO(logbase=10 TYPE=HORIZONTALSTAT)) namelen=30 ; 
class HID CancerStage(ref='I') FamilyHx(ref="no"); 
   model remission(event='1') = IL6 CancerStage Age FamilyHx
/ distribution=binary link=logit cl  oddsratio solution ddfm=betwithin; 
   random intercept / subject=HID solution CL ; 
COVTEST / WALD;
run; 
quit; 
/* Covariance Parameter Estimates for HID = 0.3721 
ICC = Covariance Parameter Estimate/(Covariance Parameter Estimates + 3.29) 
ICC = 0.3721/(0.3721+3.29)  = .1016
10% of the variability in going into remission is accounted for by the hospitals? */

Thank you. 

 

5 REPLIES 5
sbxkoenk
SAS Super FREQ

Hello,

 

I am sorry for not answering your question directly, but I just wanted to make you aware about this :

 

S A S  S A M P L E  L I B R A R Y
Intraclass Correlations

https://support.sas.com/documentation/onlinedoc/stat/ex_code/131/intracc.html

 

Also, please visit https://www.lexjansen.com/

Then enter 'intraclass correlation' as search terms and hit the magnifying glass.

 

Good luck,

Koen

PharmlyDoc
Quartz | Level 8

@sbxkoenk wrote:

Hello,

 

I am sorry for not answering your question directly, but I just wanted to make you aware about this :

 

S A S  S A M P L E  L I B R A R Y
Intraclass Correlations

https://support.sas.com/documentation/onlinedoc/stat/ex_code/131/intracc.html

 

Also, please visit https://www.lexjansen.com/

Then enter 'intraclass correlation' as search terms and hit the magnifying glass.

 

Good luck,

Koen


Thanks for the feedback. The example from the SAS Sample Library is unfortunately not helpful because it calculates inter-rater reliability. I searched lexjansen.com for "intraclass correlation" AND "3.29" and for "intraclass correlation" AND "glimmix", and the pertinent responses are listed in my original post. I'm now looking for insight from other experts.

 

Interestingly, a 2018 SAS paper states: 

"How Do You Compute an Intra-Class Correlation Coefficient (ICC) for a Non-Normal Response Model in PROC GLIMMIX (Binary, Multinomial, and so on)? SAS is not aware of a generally accepted method for calculating an ICC in a logistic model, mainly because there is no concept of a residual variance in a logistic regression model. You can form your own ratios of the variance components in your model, but SAS does not endorse such an ICC calculation for a non-normal response model." - https://www.sas.com/content/dam/SAS/support/en/sas-global-forum-proceedings/2018/2179-2018.pdf 

I wonder if SAS has since changed their position on endorsing an ICC calculation for a binary response response model? @jiltao 

 

SteveDenham
Jade | Level 19

I doubt that there has been a change in the position.  The variance component ratios can be calculated, but the are NOT correlations.  In particular, there is no "residual" variance for the binomial distribution as the variance is a function of the point estimate, and changes for each level of a categorical predictor or each point along a continuous variable.

 

SteveDenham

PharmlyDoc
Quartz | Level 8

Hi @SteveDenham. Is the residual variance for a response variable with a logistic distribution in a mixed model not π2/3 or 3.29? Thanks

 

 

"In HGLMs, there is assumed to be no error at level-1, therefore, a slight modification is needed to calculate the ICC. This modification assumes the dichotomous outcome comes from an unknown latent continuous variable with a level-1 residual that follows a logistic distribution with a mean of 0 and a variance of 3.29 (Snijders & Bosker, 1999 as cited in O’Connell et al., 2008). Therefore, 3.29 will be used as our level-1 error variance in calculating the ICC." - https://support.sas.com/resources/papers/proceedings15/3430-2015.pdf

 

STATA documentation states: "In a mixed-effects logistic and ordered logistic regression, errors are assumed to be logistic with mean 0 and variance γ. Random intercepts are assumed to be normally distributed with mean 0 and variance σ22 and to be independent of error terms. The intraclass correlation for this model is ρ = Corr(y*ij, y*i'j ) = σ22 /γ + σ22        where γ = σ21  for a mixed-effects linear regression, γ = 1 for a mixed-effects probit and ordered probit regression, γ = π 2/3 for a mixed-effects logistic and ordered logistic regression, and γ = π 2/6 for a mixed-effects complementary log–log regression. The intraclass correlation corresponds to the correlation between the latent responses i and i' from the same group j." - https://www.stata.com/manuals/meestaticc.pdf 

 

The logit "model also admits a latent variable formulation, except that this time the individual error terms eij are assumed to follow standard logistic rather than standard normal distributions. To be precise, we assume that Yij = 1 if, and only if, Y ∗ ij > 0, just as before. We further assume that Y ∗ ij follows the linear mixed model in (1) with mean η and ui ∼ N(0, σ2u), but eij now has a logistic distribution with mean 0 and variance σ2e. Just as before, we note that the scale of the latent variable is not identified. For convenience, we take eij to have a standard logistic distribution, which happens to have variance π2/3 or approximately 3.29." - https://ageconsearch.umn.edu/record/116030/files/sjart_st0031.pdf 

 

 

SteveDenham
Jade | Level 19

Well, I learned something there.  The variance will converge in distribution to that value, but the estimate could be about anything with small to medium datasets.  And note that the mean converges to zero.  Now, I believe that is a good assumption for the null hypothesis of no differences, but the realization, given the data at hand, has for each level a non-zero mean, and if this is a multifactor model, the estimates of the level means won't necessarily sum to zero, if there is any imbalance in the data  That is one thing to consider.

 

The other thing that bothers me a bit is that if I plug into the formula for the sampling variance of a binomial (equivalent to the null model), it looks like n * p hat * (i  -    p hat).  If I plug in your null model value (0.3366) and assume a p hat value of 0.5 (a best case estimate) and solve for n, I get an n of about 13.  Does that seem correct for your data?

 

SteveDenham

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 5 replies
  • 3619 views
  • 0 likes
  • 3 in conversation