Hi all, We have a cohort of approximately 400,000 children born to 150,000 mothers – i.e we have multiple children born to each mother during the span of the study. We are looking at different perinatal risk factors for late onset of diabetes in the child. We want to account for the correlation structure due to having shared moms. In general, this can be done by using GEE approaches with a repeated statement or by using mixed models. The data we have contains some missing values for several variables (about ~7%). We don’t believe the missing pattern is completely at random (MCAR), but we do believe that it is missing at random (MAR) accounting for the other covariates in the model. Since GEE approaches assume MCAR, this propelled us to choosing to use the mixed model approach, specifying models with random intercept based on maternal study ID, which give unbiased estimates when the data are MAR: proc glimmix data=crt NOCLPRINT; class mom_id; model y (descending)= x w z /dist=binary link=logit; random intercept / subject=mom_id type=cs; run; The problem is that given the size of the data, PROC GLIMMIX procedure in SAS fails to run due to having to account for so many random components in the model (i.e. a random slope for each mother), and gives the following message: “Model is too large to be fit by PROC GLIMMIX in a reasonable amount of time” . We are able to run GEE models by using PROC GENMOD, but are somewhat concerned about obtaining biased estimates due to the missing values pattern mentioned above. Could something be done to make the GLIMMIX procedure less computationally intensive? I am aware of the following (http://support.sas.com/resources/papers/proceedings12/332-2012.pdf), but this mostly refers to situations in which the distribution is normal, whereas ours is binomial. Thank you all, Dean
... View more