Solved: proc glimmix help

edhuang · Posted 06-21-2020 01:08 PM

Hi,

I am new to proc glimmix, but trying to get intraclass correlation in a 2 level nested model with a binary outcome (polyp - yes or no). Level-1 is physician and Level-2 is clinic in which physicians are nested in.

proc glimmix data=ADR noclprint method=laplace nobound;
class MD MDlocation patient ;
model polyp_yes(event=last)=/CL Dist=binary link=logit solution;
random intercept/sub=MDlocation type=vc s cl;
random intercept/sub=MD(MDlocation) type=vc s cl;
run;

Here my output for Covariance Parameter Estimates:

Cov Parm Subject Estimate StandardError

Intercept	MDlocation	-0.7062
Intercept	MD(MDlocation)	13.6852	.

Here are my questions.

1) Is my proc glimmix code correct?

2) How come my standard error is missing?

3) How do I calculate the intraclass correlation for each of the two levels for a binary outcome? In other words, what % of total variation is accounted for by MD and what % of total variation is accounted for MDlocation?

SteveDenham · Posted 06-22-2020 08:16 AM

I can take a shot at numbers 1 and 3, but I really lack expertise on ICC so I will let number 2 go.

1. A good rule of thumb for estimating a variance component for binary data is to have at least 10 clusters for the level in question. My source here is from the R community (see anything online from Bolker or Zuur).

3. For this design, the patient level is completely confounded with residual error, so there is no need to include it as a level. There should be one more covariance parameter estimate given for residual in your output. If not, then there is something else going on.

Additionally, although I hate throwing out data, consider eliminating those MD's with 5 or fewer observations. Also, you may want to have more levels for MDLocation, so perhaps a more granular classification would be in order. You may not be able to come up with 10, but certainly there is some information that would lead to more than 3 levels.

SteveDenham

View solution in original post

sld · Posted 06-21-2020 05:14 PM

The missing SEs suggest there is something that is incompatible between your data and your model. Can you show us an example of what your data set looks like? Also, how many clinics? How many physicians within clinics? How many patients within physicians within clinic? (Roughly; it's probably unbalanced.)

edhuang · Posted 06-21-2020 06:48 PM

I have 130,681 patients. 30 physicians and 3 clinics. If it is unbalanced, anything I can do? Analyze on subset of patients/MDs?

	polyp_yes
	0	1
MDlocation
0	15795	8936
1	37505	17034
2	38236	13175

	polyp_yes
	0	1
MD
5	4087	4
6	1666	0
7	3188	2751
9	5014	2384
12	2902	1748
13	3221	1259
15	2087	1390
19	1894	1264
20	3574	0
21	1773	518
22	1806	249
25	8277	4525
27	1491	1270
29	688	559
30	4395	1
32	1550	0
33	5342	1
36	5583	2915
37	3360	1802
39	3361	0
40	447	0
41	922	508
43	3855	2454
44	1894	1084
45	4148	2634
46	4618	2114
49	2378	1390
52	4635	1736
54	3682	2641
58	3377	1874

	MDlocation
	0	1	2
MD
5	0	0	282
6	1666	0	0
7	0	0	5939
9	0	0	7398
12	4650	0	0
13	0	4480	0
15	3477	0	0
19	3158	0	0
20	0	0	3574
21	0	2291	0
22	0	2055	0
25	0	12802	0
27	2761	0	0
29	0	0	1247
30	0	0	4396
32	0	1550	0
33	0	0	5343
36	0	8498	0
37	0	0	5162
39	0	0	3361
40	0	0	447
41	0	0	1430
43	0	0	6309
44	0	2978	0
45	0	6782	0
46	0	6732	0
49	3768	0	0
52	0	6371	0
54	0	0	6323
58	5251	0	0

sld · Posted 06-21-2020 08:24 PM

Thank you, that's very helpful.

With only 3 levels of MDlocation, your ability to estimate a variance is limited, there's just not enough information. In the model run that you reported, the variance was estimated as a negative value because you specified the nobound option; otherwise the estimate would have been set to zero. If your goal is to estimate an ICC for the MDlocation and MD variances, you just don't have enough data to work with.

I am puzzled by an aspect of your tables. In the second table, MD = 5 appears to have 4087 + 4 = 4091 observations. But in the third table, MD = 5 is reported as being at MDlocation = 2, with 287 observations. On quick scan, the next few MDs look OK, but I didn't look at all 30. Maybe it's just a copy/paste error in the message.

Also, I'm struck by how some MDs have very few polyp_yes observations, while many have about 30%. What distinguishes these two groups of physicians, if anything?

edhuang · Posted 06-21-2020 09:10 PM

Thanks. That's helpful. I made a copy/pasting error on MD5. Thanks for catching that. The polyp observations were missing for some physicians, hence the variability.

Follow-up questions:
1) How do you estimate the number of locations you need to get an SE? Is it a ratio between level 1 and level 2 predictors?
2) Assuming that I did get a SE, is there a formula to calculate the ICC for a two level model with a binary outcome? I understand it is estimate/(estimate+3.29) for 1 level, but what about for two levels?
3) I also tried using patient level (level 1) and then nested within physicians (level 2). However, log said the system can't handle this large number. Is there an easy way to deal with this?

SteveDenham · Posted 06-22-2020 08:16 AM

I can take a shot at numbers 1 and 3, but I really lack expertise on ICC so I will let number 2 go.

1. A good rule of thumb for estimating a variance component for binary data is to have at least 10 clusters for the level in question. My source here is from the R community (see anything online from Bolker or Zuur).

3. For this design, the patient level is completely confounded with residual error, so there is no need to include it as a level. There should be one more covariance parameter estimate given for residual in your output. If not, then there is something else going on.

Additionally, although I hate throwing out data, consider eliminating those MD's with 5 or fewer observations. Also, you may want to have more levels for MDLocation, so perhaps a more granular classification would be in order. You may not be able to come up with 10, but certainly there is some information that would lead to more than 3 levels.

SteveDenham

edhuang · Posted 06-22-2020 03:04 PM

Hi Steve,

Thanks for your reply. This is extremely helpful. I believe I can get further granularity on the location. So will try that.
I will try to eliminate the MD with low observations.

sld · Posted 06-22-2020 11:53 AM

Q1: There are rules of thumb as @SteveDenham notes; rather than 10, I would have said 20-30, but that's what rules of thumb are 🙂 More formally, you can determine the sample size required to estimate a variance (or standard deviation) with a given precision, analogous to determining sample size required to estimate a mean with a given precision. An internet search will turn up several resources.

Q2: Like Steve, I don't know much about computing ICC in a mixed model with binary data. On a quick scan, this blogpost looks reasonable, and it points out that with binary data, there is no residual variance per se, because the variance and the mean are determined by the same parameters. For more detail, there are also papers, among them:

Nakagawa S, Johnson P, Schielzeth H (2017) The coefficient of determination R2 and intra-class correlation coefficient from generalized linear mixed-effects models revisted and expanded. J. R. Soc. Interface 14. https://doi.org/10.1098/rsif.2017.0213

Wu S, Crespi CM, Wong WK. 2012. Comparison of methods for estimating the intraclass correlation coefficient for binary responses in cancer prevention cluster randomized trials. Contemp Clin Trials 33(5):869-80. doi: 10.1016/j.cct.2012.05.004

Q3: For the binary mixed model, the residual variance depends upon the expected value and so it cannot be estimated directly from the data. Thus, you do not want to force your model to estimate a residual variance.

See Section 7 in Nakagawa et al (2017) for a discussion of the distinction between using an observation-level variance (estimated using the delta method) versus a distribution-specific variance.

I hope this helps.

edhuang · Posted 06-22-2020 03:07 PM

Hi Sld,

Thanks for you and Steve's reply. They are very helpful. I will use your references. It will come in handy in the future!

sld · Posted 06-22-2020 11:59 AM

Regarding that polyp data were missing for some physicians: What is the nature of this missingness? Does this mean that some of the patients in the polyp=0 category actually had polyps that their physician did not note in the chart? I would be concerned about this as a source of bias. How many of your physicians are bad record keepers, and should they be omitted from the analysis?

edhuang · Posted 06-22-2020 03:12 PM

Hi,

You raise good points. The nature of missingness is likely due to both technical issues from data extraction and also lack of reporting by physicians. Yes, so they may actually have polyps. Fortunately, most of these are smaller observations. And you are right that bias may be introduced. Not sure if it is random bias. I will perform my analysis with and without them to determine the final model.

sld · Posted 06-22-2020 11:46 PM

It's so dichotomous (physicians have either almost zero or about 30% polyps) that I doubt it is random. The "smaller observations" is actually a problem, not a salve. Merely running the analysis with and without these potentially problematic physicians does not address the underlying issue. I say, give this more thought. Good luck!

proc glimmix help

Re: proc glimmix help

Re: proc glimmix help

Re: proc glimmix help

Re: proc glimmix help

Re: proc glimmix help

Re: proc glimmix help

Re: proc glimmix help

Re: proc glimmix help

Re: proc glimmix help

Re: proc glimmix help

Re: proc glimmix help

Re: proc glimmix help

SAS Innovate 2025: Call for Content