I have a repeated-measurement data including 1.8 million observations, and each individual contains more than one recording. I wonder to explore the relationship between outcome and exposure using PROC GLIMMIX, like this:
proc glimmix data=mydata;
class id stage;
model outcome=exposure confoundings / solution cl;
random int / subject=id(stage);
run;
But, SAS log specified an error that is "Model is too large to be fit by PROC GLIMMIX in a reasonable amount of time on this system. Consider changing your model."
I want to know if could I use PROC GENMOD+REPEATED instead to fit my model.
proc genmod data=mydata;
class id source;
model outcome=exposure confoundings ;
repeated subject=id /within=stage;
run;
First, is what sort of variable is 'outcome'? I don't see a specification of a distribution in either the code for GLIMMIX or GENMOD, which means the default distribution is Gaussian. If that is the case, then consider using PROC HPMIXED. The overview in the documentation (here ) makes it look like your situation is what HPMIXED was designed for (large number of observations).
Code would resemble this (to start)
proc hpmixed data=mydata;
class id stage;
model outcome=exposure confoundings / solution cl;
random int / subject=id(stage);
test exposure confoundings:
run;
Depending on the nature of exposure and confoundings, they may need to be added to the CLASS statement.
SteveDenham
Yes, the distribution is Gaussian, and thank you for you help.
Plus, we have yet to recognize the repeated measures nature of the data with an appropriate REPEATED statement. The current version only deals with a G-side random slope. The variable that indexes the repeated measures has not been mentioned yet.
SteveDenham
Thank you for you help. I have read some articles about this issue, and GEE with repeated method seems more suitable for my data.
If you run into memory issues with PROC GEE, then you have this to fall back on.
SteveDenham
If you want to fit a GEE model, I suggest using the newer PROC GEE rather than PROC GENMOD. Same syntax.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.