BookmarkSubscribeRSS Feed
dean_f
Calcite | Level 5

Hi all,

 

We have a cohort of approximately 400,000 children born to 150,000 mothers – i.e we have multiple children born to each mother during the span of the study. We are looking at different perinatal risk factors for late onset of diabetes in the child.

 

We want to account for the correlation structure due to having shared moms. In general, this can be done by using GEE approaches with a repeated statement or by using mixed models. The data we have contains some missing values for several variables (about ~7%). We don’t believe the missing pattern is completely at random (MCAR), but we do believe that it is missing at random (MAR) accounting for the other covariates in the model. Since GEE approaches assume MCAR, this propelled us to choosing to use the mixed model approach, specifying models with random intercept based on maternal study ID, which give unbiased estimates when the data are MAR:

 

proc glimmix data=crt NOCLPRINT;

class mom_id;

model y (descending)=  x w z /dist=binary link=logit;

      random intercept / subject=mom_id type=cs;

        run;

 

The problem is that given the size of the data, PROC GLIMMIX procedure in SAS fails to run due to having to account for so many random components in the model (i.e. a random slope for each mother), and gives the following message: “Model is too large to be fit by PROC GLIMMIX in a reasonable amount of time” . We are able to run GEE models by using PROC GENMOD, but are somewhat concerned about obtaining biased estimates due to the missing values pattern mentioned above.

 

Could something be done to make the GLIMMIX procedure less computationally intensive? I am aware of the following (http://support.sas.com/resources/papers/proceedings12/332-2012.pdf), but this mostly refers to situations in which the distribution is normal, whereas ours is binomial.

 

Thank you all,

Dean

3 REPLIES 3
dean_f
Calcite | Level 5

Hi all,

 

 we have a cohort of approximately 400,000 children born to 150,000 mothers – i.e we have multiple children born to each mother during the span of the study. We are looking at different perinatal risk factors for late onset of diabetes in the child.

We want to account for the correlation structure due to having shared moms. In general, this can be done by using GEE approaches with a repeated statement or by using mixed models. The data we have contains some missing values for several variables (about ~7%). We don’t believe the missing pattern is completely at random (MCAR), but we do believe that it is missing at random (MAR) accounting for the other covariates in the model. Since GEE approaches assume MCAR, this propelled us to choosing to use the mixed model approach, specifying models with random intercept based on maternal ID, which give unbiased estimates when the data are MAR:

 

proc glimmix data=crt NOCLPRINT;

class mom_id;

model y (descending)=  x w z /dist=binary link=logit;

      random intercept / subject=mom_id type=cs;

        run;

 

The problem is that given the size of the data, PROC GLIMMIX procedure in SAS fails to run due to having to account for so many random components in the model (i.e. a random slope for each mother), and gives the following message: “Model is too large to be fit by PROC GLIMMIX in a reasonable amount of time” . We are able to run GEE models by using PROC GENMOD, but are somewhat concerned about obtaining biased estimates due to the missing pattern mentioned above.

Could something be done to make the GLIMMIX procedure less computationally intensive? I am aware of the following (http://support.sas.com/resources/papers/proceedings12/332-2012.pdf), but this mostly refers to situations in which the distribution is normal, whereas ours is binomial.

 

Thank you all,

Dean

Rick_SAS
SAS Super FREQ

This is outside my area of expertise, but look at the new(ish) GEE procedure. It supports a binary response and the overview section says "When the data are missing at random (MAR), the weighted GEE method produces valid inference."  There is an example in the doc that describes how to use the weighted GEE method.

dean_f
Calcite | Level 5

Thank you! So one of the reason we wanted to use proc glimmix is because we believe the correlation between repeated measures (in our case, children) decreases over time, but the measurments are not equally spaced. In glimmix, one can specificy type=sp(pow) which addresses that. It is true that with GEE and robust sandwitch estimates we still get valid effect estimates and SE, but this is something we wanted to eplore. Is there a similar fucmtion available in proc genmod?

Thanks again
 

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 1523 views
  • 0 likes
  • 2 in conversation