BookmarkSubscribeRSS Feed
lotcarrots
Calcite | Level 5

Dear all,


My goal is to fit a longitudinal zero-inflated negative binomial model to my data. To my knowledge, proc mixed does not provide other distributions than normal; proc genmod has a repeated statement, but not for ZINB; and proc glimmix has a random statement for the NB, but not ZINB. So in the end I'll have to work with PROC NLMIXED.

Since I have not yet worked with NLMIXED, I try to proceed step by step. For now I work under the assumption that a negative binomial distribution is sufficient and I would like to transfer my GLIMMIX code into NLMIXED. Here I come across the problem that I'm not sure about choosing the right random effects.

The data is the following:
There are 36 samples. The samples were split into two sub-samples and stored once at 5 degrees and once at 17 degrees. The sub-samples were measured at the beginning, after 24 h and after 72 h. Altogether, the dataset contains 216 observations (with the variables POOL: 1-36, TEMPERATUR: 5/17, ZP: 0/24/72) for the number of germs (KEIMZAHL).

Here is a plot of the data, separated according to the temperature:

KeimVerlauf.PNG

Now I wonder how to properly specify the dependency and hierarchy in my data in the code. Is it correct that I indicate the dependency in GLIMMIX with a random statement like this?
   random int / subject = pool;
and the hierarchy
   random ZP / subject = temperature (pool) type = cs residual;
(For the sake of simplicity, I chose "cs" as the correlation structure. The times are not equidistant, so we can discuss this. )

Overall, my code looks like this:

PROC GLIMMIX DATA = FV1_2016;
nloptions maxiter = 1000;
  CLASS Pool ZP (ref = "0") Temperatur (ref = "5");
  MODEL Keimzahl = Temperatur | ZP / s dist = negbin;
  random int / subject = pool;
  random ZP / subject = Temperatur(Pool) type = cs residual;
RUN;


I am able to transfer the first part to NLMIXED. Again I am not sure about defining random effects in NLMIXED.

 

For NLMIXED I have to restructure my record. I do this:

DATA FV1_2016;
 SET FV1_2016;
/ * Temperature as a dummy * /
 IF Temperatur = 17 THEN dummy_temp17 = 1;
ELSE dummy_temp17 = 0;

/ * Time as a dummy * /
IF ZP = 24 THEN dummy_zp24 = 1;
ELSE dummy_zp24 = 0;
IF ZP = 72 THEN dummy_zp72 = 1;
ELSE dummy_zp72 = 0;
RUN;


My NLMIXED looks like this:

proc nlmixed data = FV1_2016 qpoints = 10 max = 500;
  parms b0 = 6.58 b1 = -1.75 b2 = -0.41 b3 = -0.87 b4 = -1.73 b5 = 1.68 alpha = 0.1 sigma2b = 0.1 to 10.1 by 1 sigma2b1 = 0.1 to 10.1 by 1;
  mu = exp (b0 + b1 * dummy_temp17 + b2 * dummy_zp24 + b3 * dummy_zp72 + b4 * dummy_temp17 * dummy_zp24 + b5 * dummy_temp17 * dummy_zp72 + c0 + c1 * ZP);
  model Keimzahl ~ negbin (1 / alpha, 1 / (1 + mu * alpha));
  random c0 c1 ~ normal ([0, 0], [sigma2b, 0, sigma2b1]) subject = pool;
  predict mu out = means;
  ods output ParameterEstimates = est_random;
run;


I realize that this produces different results because the REs are not set properly yet.

I have a few more questions, but I would like to ask this one first and then work on it a little bit before asking the next one.

Your help is very welcome!

2 REPLIES 2
Rick_SAS
SAS Super FREQ

It is certainly possible to do this analysis with PROC NLMIXED, but you might consider using PROC FMM instead. There is a Getting Started example in the doc that shows how to run a zero-inflated Poisson model. If you follow that example but change the distribution to DIST=NEGBIN, I think PROC FMM can solve your problem. If I'm wrong, write back and someone can help you with NLMIXED.

lotcarrots
Calcite | Level 5

Thank you for your reference to PROC FMM, which is also new to me and therefore very interesting. I looked at your example and also read this paper https://support.sas.com/rnd/app/stat/papers/abstracts/328-2012.html

in which the link to the hurdle-model became clearer to me.

However, I am not able to include a random effect for my repeated measurements in PROC FMM, since it is only for use with fixed effects, am I right? Furthermore, the ref statements do not work. Is it possible to change the reference categories in PROC FMM? Thirdly, if I just look at the fit statistics of the two models, NB and ZINB, in PROC FMM, then my Pearson statistic suddenly gets larger. Why this?

FitStats_NB_ZINB.PNG

Added: This would be my mixture model

proc fmm data=FV1_2016;
class Pool ZP Temperatur;
model Keimzahl = Temperatur|ZP / dist=negbin; /* not truncated, zeros can come from both components */
model Keimzahl =             / dist=Constant; /* constant distr for the zero group */
output out = fmm_ZINB residual pred;
run;

Unfortunately, the residuals do not look that good either:

FMM_NB_Residuals.PNG

 

 

After some considerations, I came to the following conclusion with my approach in GLIMMIX:

I just need a random effect for the repetitions, i.e. a two-way model with paired samples and repeated measurements. This should work with

 

PROC GLIMMIX DATA=FV1_2016;
nloptions maxiter=1000;
CLASS Pool ZP Temperatur;
MODEL Keimzahl =  Temperatur|ZP / s dist=negbin;
random _residual_ / subject=Pool type=UN;
LSMEANS ZP|Temperatur;
RUN;

Do you think that it is sufficient? Just weird that it only converges with UN or ar(1) as covariance structure, which does not make it any easier for my idea with PROC NLMIXED. 😞

FitStats_NB_UN_AR1.PNG

How can I compare both the UN-model and the AR(1)-model in terms of AIC? Actually, I do not have equal spacing, thus I would prefer the second model.

 

I really appreciate your feedback on my thoughts and questions.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 1075 views
  • 0 likes
  • 2 in conversation