BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Laian_N
Calcite | Level 5

Hi,

 

I am using SAS version 9.4 to examine the association between physical activity (PA) and alcohol consumption (AC) at between- and within-person levels over 21 days. The study design is multilevel with repeated measures, with days nested in individuals (253 individuals - 21 days/measurements). My time structure is days (21 days - 0,....,21). My outcome variable is AC (total drinks per day per individual - count data) and it follows a Poisson distribution (there is evidence of overdispersion, so I am trying to fit a negative binomial distribution). Predictors are average PA (person-specific average measures of PA centered at the grand mean) and daily PA (person-centered).

 

Issue 1: The random intercept model converged fine but the model didn't fit the data very well.

Issue 2: When I added average PA, the model failed to converge.

 

I have tried different types of covariance structures, but none seemed to work. I am wondering if the issue could be zero-inflation in the outcome variable, as 70% of my AC observations are 0? I found nothing on GLIMMIX dealing with zero-inflated negbin though, so I am wondering whether I need to use another procedure? Also, my PA variables are significantly positively skewed, so I am wondering if that could be a part of the issue and whether I need to transform the predictors? I'd very much appreciate any input, thanks!

 

Below is the code I used:

 

*random intercept model that converged but didn't fit the data very well*;

proc glimmix data=SHB.Days_final lognote plots=all;
class IDnum Days_c;
model AC = / solution dist=negbin;
random Days_c / subject=IDnum type=ar(1) residual;
nloptions maxiter=500;
run;

*model with predictor that did not converge*;

proc glimmix data=SHB.Days_final lognote plots=all;
class IDnum Days_c;
model AC = averagePA / solution dist=negbin;
random Days_c / subject=IDnum type=ar(1) residual;
nloptions maxiter=500;
run;

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
SteveDenham
Jade | Level 19

Hi Laian,

 

Try the following versions of your two models:

 

*random intercept model*;

proc glimmix data=SHB.Days_final lognote plots=all;
class IDnum Days_c;
model AC = Days_c/ solution dist=negbin;
random intercept/subject=IDnum; 
random Days_c / subject=IDnum type=ar(1) residual;
nloptions maxiter=500;
run;

*model with predictor that did not converge*;

proc glimmix data=SHB.Days_final lognote plots=all;
class IDnum Days_c;
model AC = averagePA  Days_c/ solution dist=negbin;
random intercept/subject=IDnum;
random Days_c / subject=IDnum type=ar(1) residual;
nloptions maxiter=500;
run;

The major differences are including Days_c in the model statement, so the repeated nature of the residuals you are modeling remove the fixed day effect, and including a second RANDOM statement that explicitly models a random intercept.

 

Give these a try.  If there is still a convergence problem, you may have identified the issue as zero-inflation.  I am sure somebody has figured a way to address this in a mixed model setting, but I'm coming up blank off the top of my head.

 

SteveDenham

 

 

View solution in original post

7 REPLIES 7
SteveDenham
Jade | Level 19

Hi Laian,

 

Try the following versions of your two models:

 

*random intercept model*;

proc glimmix data=SHB.Days_final lognote plots=all;
class IDnum Days_c;
model AC = Days_c/ solution dist=negbin;
random intercept/subject=IDnum; 
random Days_c / subject=IDnum type=ar(1) residual;
nloptions maxiter=500;
run;

*model with predictor that did not converge*;

proc glimmix data=SHB.Days_final lognote plots=all;
class IDnum Days_c;
model AC = averagePA  Days_c/ solution dist=negbin;
random intercept/subject=IDnum;
random Days_c / subject=IDnum type=ar(1) residual;
nloptions maxiter=500;
run;

The major differences are including Days_c in the model statement, so the repeated nature of the residuals you are modeling remove the fixed day effect, and including a second RANDOM statement that explicitly models a random intercept.

 

Give these a try.  If there is still a convergence problem, you may have identified the issue as zero-inflation.  I am sure somebody has figured a way to address this in a mixed model setting, but I'm coming up blank off the top of my head.

 

SteveDenham

 

 

Laian_N
Calcite | Level 5

Thanks for your speedy response! And yes, the models did converge, thank you, but the data does not appear to fit very well, which may be due to zero-inflation.

SteveDenham
Jade | Level 19

I suppose you could do a two-stage analysis. Stage 1 would be a logistic link analysis, with any 0 = 0 and any value>0 set to 1.  That would set a hurdle for the zero inflation.  Then fit the non-zero responses to the negbin distribution.  More sophisticated would be to get the predicted value for each case as 0 or 1 from the logistic analysis, and then fit the negbin to the predicted 1 cases.  The key here is to look at various cutpoints for the classification (sensitivity analysis).

 

SteveDenham

Laian_N
Calcite | Level 5

"I suppose you could do a two-stage analysis. Stage 1 would be a logistic link analysis, with any 0 = 0 and any value>0 set to 1.  That would set a hurdle for the zero inflation.  Then fit the non-zero responses to the negbin distribution."

 

Thanks! I tried the above suggestion, but noticed a large imbalance in days when dropping 0s to run the negbin distribution, with some participants having only 1 or 2 days of data while others many more (up to 21 days). And given this is a repeated measures design, I am not sure how well I could interpret the results. I also assume this method would require a two-stage interpretation of the results? 

 

I will look into the second method you suggested. Thanks very much for your help!

SteveDenham
Jade | Level 19

Well, so long as the drop out rate is not influenced by the predictors in the model, the mixed model approach is really pretty good in the face of imbalance.  You would probably want to look at the SOLUTION vector for the RANDOM statements to get an idea of how much the drop out is affecting the predicted values.

 

SteveDenham

Laian_N
Calcite | Level 5

Thanks, Steve. I've found a few useful sources that may also be helpful for others to fit hurdle models or using proc NLMIXED to fit 2 or more distributions. 

 

Min, Y., & Agresti, A. (2005). Random effect models for repeated measures of zero-inflated count data. Statistical Modelling: An International Journal, 5(1), 1–19. https://doi.org/10.1191/1471082X05st084oa

Zhu, H., Luo, S., & DeSantis, S. M. (2017). Zero-inflated count models for longitudinal measurements with heterogeneous random effects. Statistical Methods in Medical Research, 26(4), 1774–1786. https://doi.org/10.1177/0962280215588224

 

https://support.sas.com/resources/papers/proceedings17/0902-2017.pdf

https://support.sas.com/kb/48/506.html

https://video.sas.com/detail/video/6096489321001/handling-excess-zeros-with-fmm-procedure-proc-fmm

 

SteveDenham
Jade | Level 19

The Zhu et al. paper is what I was thinking about and Robin High is a tremendous source for NLMIXED approaches.  I beleive these were the two sources I was thinking about, but couldn't remember.

 

SteveDenham

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 7 replies
  • 791 views
  • 0 likes
  • 2 in conversation