Hello All,
I ran a linear mixed model (LMM) with country (55 countries) included as a random intercept. The random intercept for country was statistically significant, and the model fit improved significantly, evidenced by a lower -2 Log Likelihood—compared to the model without country as a random effect.
proc MIXED data=tmp method=ML covtest;
class country;
model dependent_var =var1 var2 ..../s ddfm=kr ;
random intercept /subject=country;
run;
The concern is that 22 countries have only one respondent and 8 countries have 2 respondents (below is the frequency), I was thinking to say: Even though the model with country as a random intercept looks better fit, but I will go with simple linear regression not mixed model due to sparse data/unstable estimate. Or would it be better to run the GEE model with an independent correlation structure?
proc genmod data=tmp;
class country ;
model ave_score =var1 var2 .... / dist=normal link=id type3;
repeated subject=country/ type=ind;
run;
country | Frequency |
country_1 | 1 |
country_2 | 1 |
country_3 | 1 |
country_4 | 24 |
country_5 | 1 |
country_6 | 2 |
country_7 | 1 |
country_8 | 5 |
country_9 | 22 |
country_10 | 2 |
country_11 | 1 |
country_12 | 1 |
country_13 | 20 |
country_14 | 12 |
country_15 | 1 |
country_16 | 1 |
country_17 | 1 |
country_18 | 1 |
country_19 | 1 |
country_20 | 2 |
country_21 | 2 |
country_22 | 1 |
country_23 | 2 |
country_24 | 1 |
country_25 | 6 |
country_26 | 1 |
country_27 | 3 |
country_28 | 4 |
country_29 | 32 |
country_30 | 5 |
country_31 | 1 |
country_32 | 6 |
country_33 | 15 |
country_34 | 10 |
country_35 | 1 |
country_36 | 1 |
country_37 | 18 |
country_38 | 2 |
country_39 | 3 |
country_40 | 2 |
country_41 | 1 |
country_42 | 3 |
country_43 | 6 |
country_44 | 8 |
country_45 | 1 |
country_46 | 2 |
country_47 | 2 |
country_48 | 7 |
country_49 | 6 |
country_50 | 18 |
country_51 | 5 |
country_52 | 1 |
country_53 | 1 |
country_54 | 9 |
country_55 | 3 |
total | 290 |
I would appreciate your help in choosing the best approach,
Thanks so much!
Both approaches can deal with the structure of your data. The random effects model in MIXED is a subject-specific model best for individual predictions. The GEE model is a marginal or population-averaged model that is best for making population inferences. But as noted by Allison in his book, "Fixed Effects Regression Methods for Longitudinal Data Using SAS" (Allison, P., SAS Institute, 2005), these are effectively equivalent in the case of the linear model like yours, though you might want to use the exchangeable structure (TYPE=EXCH) in the GEE model. Another possible approach is the fixed effects model that Allison's book also discusses and which is implemented by the ABSORB statement in PROC GLM.
Note that the recommended procedure for fitting the GEE model is now PROC GEE though GENMOD can certainly be used. Also, the GEE model does not use a likelihood-based approach, so model comparisons using the likelihood or measures like AIC are not possible.
Both approaches can deal with the structure of your data. The random effects model in MIXED is a subject-specific model best for individual predictions. The GEE model is a marginal or population-averaged model that is best for making population inferences. But as noted by Allison in his book, "Fixed Effects Regression Methods for Longitudinal Data Using SAS" (Allison, P., SAS Institute, 2005), these are effectively equivalent in the case of the linear model like yours, though you might want to use the exchangeable structure (TYPE=EXCH) in the GEE model. Another possible approach is the fixed effects model that Allison's book also discusses and which is implemented by the ABSORB statement in PROC GLM.
Note that the recommended procedure for fitting the GEE model is now PROC GEE though GENMOD can certainly be used. Also, the GEE model does not use a likelihood-based approach, so model comparisons using the likelihood or measures like AIC are not possible.
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.