Hello,
I have a nested data where individuals are nested within clusters and clusters nested within regions. I am trying model three level hierarchical model using PROC MCMC; however, I have faced difficulty in fitting the random effects.
In PROC MIXED and PROC GLIMMIX we do have option to show that one is nested within the higher level unit like cluster is nested with region.
Proc glimmix data = care;
Class region cluster;
Model y (event = last) = v01 v02 v03 v04 v05 v06 v07 v08 / s dist = binary link = logit;
Random int v01 v02 v03 v04 v05 v06 v07 v08 / sub = region cl s type = vc;
Random int v01 v02 v03 v04 / sub = cluster (region) cl s type = vc;
Covtest / wald;
Run;
How can I model the random intercept and random slope model showing individuals are nested with clusters and clusters with regions exactly like what I model them in proc glimmix?
Proc mcmc data = Care seed = 5235 nmc = 100000 outpost = xcare;
Parms B0-B4 S2;
Parms S2g S2d;
Prior B: ~ normal (0, var = 1e6);
Prior S2 ~ igamma (shape = 0.01, scale = 0.01);
Prior S2g ~ igamma (shape = 0.01, sacle = 0.01);
Prior S2d ~ igamma (shape = 0.01, sacle = 0.01);
random gamma ~ normal (0, var = S2g) subject = region;
random delta ~ normal (0, var = S2d) subject = cluster;
Mu = B0 + B1*v01 + B2*v02 + B3*v03 + B4*v04;
P = logistic (Mu + gamma + delta)
Model y ~ binary (P);
Run;
How do I manage the classification variables in PROC MCMC: region and cluster?
Regards
Teketo
See the details section in the documentation "Create Design Matrix".
Hi,
If I understand the question correctly, your data doesn't have an explicit ClusterXRegion indicator.
I would create a ClusterXRegion variable in a datastep, e.g.
ClusterXRegion = compress(put(cluster,best.) || "_" || put(region,best.));
then replace
random delta ~ normal (0, var = S2d) subject = cluster;
with
random delta ~ normal (0, var = S2d) subject = clusterXRegion;
Hi Garnett,
Many thanks.
Does subject = clusterXRegion in proc mcmc has the same meaning as subject = cluster (region) that I used in proc glimmix?
Does it mean cluster is nested within region, which I am interested in?
Regards
Hi there,
The answer is yes, if I understand the data structure correctly. It's easy enough to test the model in GLIMMIX and MCMC without covariates!
Garnett
Hi Garnett,
Many thanks. I got it clear now.
However, I have ambiguities about how to specify the random intercepts and slopes for three level hierarchical model in Proc MCMC. Lets assume that we have 4 individual level variables (V01, V02, V03, V04), 3 variables at cluster level (V05, V06, V07) and 2 region level variables (V08, V09).
In PROC GLIMMIX model, I can build the model like this;
Proc glimmix data = care;
Class cluster region;
model y (event = last) = V01 V02 V03 V04 V05 V06 V07 V08 V09 / S;
random int V01 V02 V03 V04
V05 V06 V07 / Subject = region; *7 random slopes and a random intercept;
random int V01 V02 V03 V04 / Subject = cluster; *4 random slopes and a random intercept;
run;
Here is my attempt in PROC MCMC;
Data xcare;
V11 = compress (cluster) || "_" || compress(region);
Run;
Proc mcmc data = xcare seed = 570 nmc = 100000 outpost = mcmc_care;
Parms B0-B9; *Since I have nine fixed effects as it is built in proc glimmix;
Parms S2g S2d;
Prior B: ~ normal (0, var = 1e6);
Prior S2g ~ igamma (shape = 0.01, sacle = 0.01);
Prior S2d ~ igamma (shape = 0.01, sacle = 0.01);
random gamma? ~ normal (0, var = S2g) subject = region; *How can I specify the 7 random slopes and random intercept? ;
random delta? ~ normal (0, var = S2d) subject = V11; *How can I specify the 4 random slopes and random intercept? ;
Mu = B0 + B1*V01 + B2*V02 + B3*V03 + B4*V04 + B5*V05 + B6*V06 + B7*V07 + B8*V08 + B9*V09;
P = logistic (Mu + gamma? + delta?);
Model y ~ binary (P);
Run;
How can I specify the same model like proc glimmix with seven random slopes and a random intercept for the Subject = region and the four random slopes and a random intercept for the subject = V11?
Regards
"How can I specify the same model like proc glimmix with seven random slopes and a random intercept for the Subject = region and the four random slopes and a random intercept for the subject = V11"
You need to have a RANDOM statement for each component, i.e.
Random V01~Normal(0, var=VarV01) subject=Region;
Random V02~Normal(0, var=VarV02) subject=Region;
etc.
Same idea for subject=V11.
Note the different variances for each "random effect". However, you can specify any kind of covariance structure you like.
Hi Garnett,
Many thanks. I have got a very clear explanation and solution.
However, I came up with another problem while running my actual data. I have got 11 (Level 3: subject = region) and 622 (Level 2: subject = V11) subject effects. Is there any limit on the maximum number of subjects? How do the subject effects affect the convergence?
Moreover, is there a limit on the number of fixed effects and random effects on the model building process? For instance, I have got 15 - 20 fixed effects and 7 - 10 random effects on a three-level mixed effect model that I will be identifying which variables have an effect on the outcome of interest. The computer is taking too long to finish, most of the cases, it does not finish and ends up with error messages.
Regards
Teketo
I assume we are talking about PROC MCMC:
"Is there any limit on the maximum number of subjects?"
-Not that I'm aware of. I've run models with 1000s of subjects.
"How do the subject effects affect the convergence?"
-It doesn't, but a lot depends on the choice of priors and the amount of data you have on the individual subjects.
"Moreover, is there a limit on the number of fixed effects and random effects on the model building process?"
-Not that I've ever encountered.
"The computer is taking too long to finish, most of the cases, it does not finish and ends up with error messages."
-My guess is that the model is specified incorrectly for the data structure. What do you mean "it does not finish"?
What error messages are you getting?
You must, must, must test a much reduced version of the model for stability!
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.