Hello,
I am doing a multilevel level analysis using the PROC GLIMMIX procedure in SAS. The data I got is a categorical data with a very small number of events. The data were collected using multi-stage cluster sampling procedures, i.e., individuals were nested within clusters, and clusters were nested within regions. The total sample size is 6954 and the events occurred are 254, which is only 3.65% of the total samples.
Since the number of events are very small, I am getting an error message when I run it. Is there another possible statistical method that can accommodate it? I really appreciate your advice and support in this regard.
The sample code I used is below:
proc glimmix data = home.caesarean2 method = laplace;
class region hregion;
model M17_1 (event = last) = v012 v106 v717 v501 v701 v705 v150h v136 v190 v130 v743a v201 m14_1 m10_1 v212
FACTYPE distance csa1 csa2 csr1 csr2 gr1 gr2 hfma v025
HREGION/s cl dist = binary link = logit;
random intercept v012 v106 v717 v501 v701 v705 v150h v136 v190 v130 v743a v201 m14_1 m10_1 v212
FACTYPE distance csa1 csa2 csr1 csr2 gr1 gr2 hfma v025 /subject = region cl s type = vc;
random intercept v012 v106 v717 v501 v701 v705 v150h v136 v190 v130 v743a v201 m14_1 m10_1 v212
/subject = v001 (region) cl s type = vc;
covtest / wald;
run;
Kind regards
Teketo
When you get an error it is very helpful to copy the code and error message(s) both from the log and paste into a code box opened using the forum's {I} icon.
As a minimum all CLASS variables must appear in the MODEL statement. I don't see REGION appearing on the model statement but is on the class statement.
As @ballardw says, it would be very helpful to have details about the nature of the errors.
Your model is very ambitious: it specifies a lot of parameters to estimate, and I would not be surprised if you've just run out of data to support all that estimation.
GLIMMIX allows SUBJECTs in RANDOM statements to be continuous rather than classification, but see https://support.sas.com/documentation/cdl/en/statug/63033/HTML/default/viewer.htm#statug_glimmix_sec... where it says
SUBJECT=effect
SUB=effect
identifies the subjects in your generalized linear mixed model. Complete independence is assumed across subjects. Specifying a subject effect is equivalent to nesting all other effects in the RANDOM statement within the subject effect.
Continuous variables and computed variables are permitted with the SUBJECT= option. PROC GLIMMIX does not sort by the values of the continuous variable but considers the data to be from a new subject whenever the value of the continuous variable changes from the previous observation. Using a continuous variable can decrease execution time for models with a large number of subjects and also prevents the production of a large "Class Levels Information" table.
So if REGION and V001(REGION) are not sorted appropriately, you could be specifying too many subjects.
I think your specification of random slopes at both REGION and V001(REGION) levels could be incorrect. I would think that you would use means (computed over the V001 levels within each REGION) as predictor variables at the REGION level. The paper by Judith Singer does a good job of developing this idea; there are lots of other resources as well, of course.
In addition to considering a different model specification and understanding more about what you are attempting, I would start simply and build up--in other words, do not throw all of the predictors into the model at once. With this many continuous predictor variables, assessing the linearity assumption will be a challenge.
I hope this helps.
Hello,
Many thanks.
I did include all the class variables in the model statement.
The SUBJECTs in the RANDOM statement are discrete, i.e. REGION ranges from 1 to 11 and V001 from 1 to 622.
SAS stop processing the procedure; I am getting the following error message:
{
proc glimmix data = cs.caesarean2 method = laplace;
class region hregion;
model M17_1 (event = last) = v012 v106 v717 v501 v701 v705 v150h v136 v190 v130 v743a v201
m14_1 m10_1 v212
FACTYPE distance csa1 csa2 csr1 csr2 gr1 gr2 hfma v025
region HREGION/s cl dist = binary link = logit;
random intercept v012 v106 v717 v501 v701 v705 v150h v136 v190 v130 v743a v201 m14_1 m10_1 v212
FACTYPE distance csa1 csa2 csr1 csr2 gr1 gr2 hfma v025 /subject = region cl s type = vc;
random intercept v012 v106 v717 v501 v701 v705 v150h v136 v190 v130 v743a v201 m14_1 m10_1 v212
/subject = v001 (region) cl s type = vc;
covtest / wald;
run;
NOTE: The GLIMMIX procedure is modeling the probability that M17_1='1'.
ERROR: The SAS System stopped processing this step because of insufficient memory.
NOTE: PROCEDURE GLIMMIX used (Total process time):
real time 7.40 seconds
cpu time 6.61 seconds
}
Kind regards
Teketo
Ah, yes, you do have REGION as a CLASS variable, my apologies. But V001 is not in CLASS, so sorting would be a necessary concern.
Still, your model is attempting to estimate 622 random effects (i.e., one slopes for EACH level of V001) for EACH continuous predictor variable (of which there are 15). That is a lot, a lot of parameter estimates: even if you had enough data, you don't have enough memory. I'm still thinking that you are asking way too much of your model, and that you need to ponder what statistical model mirrors your experimental design, what you want, and what is possible. Push back from the keyboard, and give it some thought.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.