Solved: Linear mixed model or GEE or Linear regression when many clusters have...

bhr-q · Posted 05-05-2025 09:53 PM

Hello All,

I ran a linear mixed model (LMM) with country (55 countries) included as a random intercept. The random intercept for country was statistically significant, and the model fit improved significantly, evidenced by a lower -2 Log Likelihood—compared to the model without country as a random effect.

proc MIXED data=tmp method=ML covtest;
class country;
model dependent_var =var1 var2 ..../s ddfm=kr ;
random intercept /subject=country;
run;

The concern is that 22 countries have only one respondent and 8 countries have 2 respondents (below is the frequency), I was thinking to say: Even though the model with country as a random intercept looks better fit, but I will go with simple linear regression not mixed model due to sparse data/unstable estimate. Or would it be better to run the GEE model with an independent correlation structure?

proc genmod data=tmp;
class country ;
model ave_score =var1 var2 .... / dist=normal link=id type3;
repeated subject=country/ type=ind;
run;

country	Frequency
country_1	1
country_2	1
country_3	1
country_4	24
country_5	1
country_6	2
country_7	1
country_8	5
country_9	22
country_10	2
country_11	1
country_12	1
country_13	20
country_14	12
country_15	1
country_16	1
country_17	1
country_18	1
country_19	1
country_20	2
country_21	2
country_22	1
country_23	2
country_24	1
country_25	6
country_26	1
country_27	3
country_28	4
country_29	32
country_30	5
country_31	1
country_32	6
country_33	15
country_34	10
country_35	1
country_36	1
country_37	18
country_38	2
country_39	3
country_40	2
country_41	1
country_42	3
country_43	6
country_44	8
country_45	1
country_46	2
country_47	2
country_48	7
country_49	6
country_50	18
country_51	5
country_52	1
country_53	1
country_54	9
country_55	3
total	290

I would appreciate your help in choosing the best approach,

Thanks so much!

StatDave · Posted 05-06-2025 07:02 PM

Both approaches can deal with the structure of your data. The random effects model in MIXED is a subject-specific model best for individual predictions. The GEE model is a marginal or population-averaged model that is best for making population inferences. But as noted by Allison in his book, "Fixed Effects Regression Methods for Longitudinal Data Using SAS" (Allison, P., SAS Institute, 2005), these are effectively equivalent in the case of the linear model like yours, though you might want to use the exchangeable structure (TYPE=EXCH) in the GEE model. Another possible approach is the fixed effects model that Allison's book also discusses and which is implemented by the ABSORB statement in PROC GLM.

Note that the recommended procedure for fitting the GEE model is now PROC GEE though GENMOD can certainly be used. Also, the GEE model does not use a likelihood-based approach, so model comparisons using the likelihood or measures like AIC are not possible.

View solution in original post

StatDave · Posted 05-06-2025 07:02 PM

Both approaches can deal with the structure of your data. The random effects model in MIXED is a subject-specific model best for individual predictions. The GEE model is a marginal or population-averaged model that is best for making population inferences. But as noted by Allison in his book, "Fixed Effects Regression Methods for Longitudinal Data Using SAS" (Allison, P., SAS Institute, 2005), these are effectively equivalent in the case of the linear model like yours, though you might want to use the exchangeable structure (TYPE=EXCH) in the GEE model. Another possible approach is the fixed effects model that Allison's book also discusses and which is implemented by the ABSORB statement in PROC GLM.

Note that the recommended procedure for fitting the GEE model is now PROC GEE though GENMOD can certainly be used. Also, the GEE model does not use a likelihood-based approach, so model comparisons using the likelihood or measures like AIC are not possible.

bhr-q · Posted 05-12-2025 09:14 PM

Thanks so much for your answer, it was helpful, The reason I used an independent correlation structure is that, if cluster size is informative, GEE models with exchangeable correlation can produce biased estimates.
https://pubmed.ncbi.nlm.nih.gov/37439089/
https://pmc.ncbi.nlm.nih.gov/articles/PMC9908044/

Linear mixed model or GEE or Linear regression when many clusters have only one respondent?

Re: Linear mixed model or GEE or Linear regression when many clusters have only one respondent?

Re: Linear mixed model or GEE or Linear regression when many clusters have only one respondent?

Re: Linear mixed model or GEE or Linear regression when many clusters have only one respondent?

Linear mixed model or GEE or Linear regression when many clusters have only one respondent?

Re: Linear mixed model or GEE or Linear regression when many clusters have only one respondent?

Re: Linear mixed model or GEE or Linear regression when many clusters have only one respondent?

Re: Linear mixed model or GEE or Linear regression when many clusters have only one respondent?

Registration is open