BookmarkSubscribeRSS Feed
Vivyw
Calcite | Level 5

How do you fit a two-part mixed effects model in SAS? I have clustered data (crossed design) and the response variable is semi-continuous with many zeros and is bounded in [0,1). I want to fit a two-part mixed effects model to this data including the assumption that the two parts are correlated. So far I have tried fitting this model using PROC GLIMMIX. 

 

The outcome variable is y. First, to prepare the data for Proc Glimmix, I do the following data step to create a new string variable called "distribution" that indicates whether the observation is "Binary" or "normal" so as to fit two parts with different distributions. I have transformed the nonzero part of y using ARSIN(SQRT(.)). The variable "zero" is coded as 0 if y=0 and 1 if y>0. 

 

data data2;
length distribution $7;
set data1;
response = (zero = 1);
distribution = "Binary";
output;
response = y;
distribution = "Normal";
output;
keep ID1 ID2 response distribution x1 x2 x3 x4;
run;

 

The variable "response" will then be used as outcome variable in PROC GLIMMIX as below; 

 

proc glimmix data = data2 method =rmpl noclprint pconv=1e-10 maxopt=100 ;
class ID1 ID2 distribution x1 x2 x3 x4 ;
model response(event ="1") = distribution distribution*x1 distribution*x2
distribution*x3 distribution*x4 / noint solution dist=byobs(distribution) ddfm=bw;
random distribution/ subject = ID1 type = un solution ;
random distribution / subject = ID2 type =un solution ;
nloptions tech=nrridg maxiter=1000 gconv=0;
ods exclude solutionr ; 
run;

This model, however, does not converge. Does anyone have an idea on how to fit such models or how to adjust the Proc Glimmix above? 

 

Thank you! 

 

NOTE: The code above was adapted from the article below but there they do not have crossed random effects: 

Baldwin, S. A., Fellingham, G. W., & Baldwin, A. S. (2016). Statistical models for multilevel skewed physical activity data in health research and behavioral medicine. Health Psychology, 35(6), 552.

 

4 REPLIES 4
SteveDenham
Jade | Level 19

PROC GLIMMIX defaults to a maximum iteration of 20, so almost always the first thing I do, is use an NLOPTIONS statement to set MAXITER, like this:

 

nloptions maxiter=1000;

Your use of maxopt=100 in the PROC GLIMMIX statement does allow for 100 iterations, but you may need more, provided the likelihood function is well-behaved.

The next thing I would consider is to not use ARSIN(SQRT()) as the transformation outside of GLIMMIX.  Rather, I would use the canonical link function (logit)..

 

So on to the values you see in the iteration history.  Do they seem to be converging (although slowly), or is there something unusual happening?  Is it growing by a fixed amount at each step?  Does it look like it is trying to converge, only to jump to some value that seems pathological?  Does it look like it should be called converged at some point, but just keeps going?  Each of these is a symptom of something different.

 

SteveDenham

Vivyw
Calcite | Level 5

Thanks for your reply. 

Do you mean not to transform but use beta distribution (with logit link)?

 

For the values in the iteration history, they seem like it should have converged at some point, but just keeps going. 

 

 

tellmeaboutityo
Obsidian | Level 7
If I'm not mistaken, it's typical to create a cross classified variable rather than to treat the two classes as separate levels, which would assume full nesting rather than cross-classification. So instead of a three-level model, like you have, you would have only a two-level model, with your cross-classified variable defining the second level. For example:

data want; set have;
crossclassifier = catt(ID1, "-", ID2);
run;

proc glimmix data = want method =rmpl noclprint pconv=1e-10 maxopt=100 ;
class crossclassifier distribution x1 x2 x3 x4 ;
model response(event ="1") = distribution distribution*x1 distribution*x2
distribution*x3 distribution*x4 / noint solution dist=byobs(distribution) ddfm=bw;
random distribution/ subject = crossclassifier type = un solution ;
nloptions tech=nrridg maxiter=1000 gconv=0;
ods exclude solutionr ;
run;

Raudenbush, S. W., & Bryk, A. S. (2002). Chapter 12: Models for Cross-Classified Random Effects. Hierarchical linear models: Applications and data analysis methods (Vol. 1). sage.

SteveDenham
Jade | Level 19

@tellmeaboutityo , I had not seen that article and method.  What a good idea!

 

SteveDenham

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 1126 views
  • 2 likes
  • 3 in conversation