Hi,
in a longitudinal cohort study I am investigating individuals being exposed to environmental toxins in early childhood and their risk of developing ADHD. The data has a multilevel structure (subjects within families (family_id) within different regions, below the variable f_region) and I used Proc Glimmix and ran the syntax below. (lnpyrs are person years)
Proc glimmix data=temp method=quad;
class toxin yeargrp SES f_region family_id;
model case=toxin yeargrp SES/ dist=poisson link=log offset=lnpyrs covb cl solution;
random intercept / subject=f_region;
random intercept / subject=family_id(f_region);
run;
However, I have a problem when I include families (family_id) in the multilevel model because there are too few individuals in each strata or maybe to many strata...anyhow the model does not converge. Does anyone have a suggestion how to handle this problem?
I need to keep the individual level and I cannot exclude siblings.
I have been told that some take clustering into account by estimating clustered standard errors e.g. by the repeated statement in proc genmod but is there a similar option in proc glimmix ?
I hope to hear from some of you 🙂
Best regards,
Malene
When you say "does not converge", the first thing I look for is an NLOPTIONS statement, where you can increase the number of iterations beyond the default 20. For now try adding
nloptions maxiter=1000;
If you still do not converge, then look at this paper for some really good ideas about improving convergence in mixed models:
https://support.sas.com/resources/papers/proceedings12/332-2012.pdf
SteveDenham
Thanks again for your response 🙂
when i write the
nloptions maxiter=1000;
I get this error "the sas system stopped processing this step because of insufficient memory"
I have approximately 900 000 individuals in my cohort which means that my fam_id has about 400 000 strata and some strata only include 1 individual
I just tried proc genmod and included the repeated subject=fam_id and I have the same problem... convergence problems and insufficient memory...is it because I have too many strata with too few individuals?
Best,
Malene
It is the problem, in the sense that you present it. I suppose if you had terabytes of CPU that wouldn't be a problem.
What can be done? First, you need to consolidate some of these. You have region, family within region, and individual within family (=residual). An R side approach in GLIMMIX may be useful (but I worry about twins under this approach). Consider family as a repeated measure within region. If you aggregate at the family level, and assign a weight = number of family members, this might work (no guarantees, though):
Proc glimmix data=temp;
class toxin yeargrp SES f_region ;
nloptions maxiter=500 tech=nrridg;
model case=toxin yeargrp SES f_region/ dist=poisson link=log offset=lnpyrs covb cl solution;
random f_region/residual subject=family_id type=cs;
weight = famnumbers;
run;
There are some important considerations here.
First, you need a unique numeric family_id for each family so that it can be treated as a continuous variable, It would be a good idea to sort the dataset by this variable.
Second, toxin yeargrp and SES may have different effects in each region. This model would give the marginal effects averaged over region. To get effects for each region, you would have to specify interactions.
Third, case will have to be aggregated on a family level. This part is not too difficult using PROC MEANS, which would also give the number of observations for each family (famnumbers). The join on the design matrix (original dataset) may be difficult.
My fear is that if the clustered SD approach failed in GENMOD due to memory constraints, then it will happen here as well. Jack-knifing the big dataset into smaller sets that will run, and then consolidating the results, might be an approach to consider then.
SteveDenham
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.