BookmarkSubscribeRSS Feed
PSB
Fluorite | Level 6 PSB
Fluorite | Level 6

Hi all,

I'm running ZINB via PROC GENMOD. Most models have run smoothly but I am running into an issue with one of my models. I'm using various child characteristics to predict zero-inflated behavioral and emotional count outcomes. Most IVs are continuous except for medication status (yes vs no). For one particular outcome, the logit portion of the model does not produce a p-value for my categorical variable, and produces the same value for the estimate and CIs; I'm not receiving any error messages. My online searches have yielded results predominately re: missing values.

 

Here is my code and output for the logit model.

Any resources/thoughts would be greatly appreciated!

 

TITLE 'ZINB, DV = Anger Emo log';
PROC GENMOD DATA = datafile;
CLASS Meds;
MODEL Emotion_anger = meds time x1 x2 x3 / link = log dist = zinb offset = log_var;
ZEROMODEL meds time x1 x2 x3;
RUN;

 

Logit Model Output:

Analysis Of Maximum Likelihood Zero Inflation Parameter EstimatesParameter 
Estimate; SE; Wald 95% CI; Wald Chi-Square; p-value 
Intercept; df = 1-25.45540.7635-26.9518-23.95911111.72<.0001
Meds yes; df = 024.43130.000024.431324.4313..
Meds no; df = 00.00000.00000.00000.0000..
time; df = 10.07250.1813-0.28280.42790.160.6891
x1; df = 15.39791.98721.50319.29277.380.0066
x2; df = 14.12861.52351.14257.11467.340.0067
x3; df = 1-0.16020.0610-0.2798-0.04066.890.0087

 

10 REPLIES 10
PaigeMiller
Diamond | Level 26

This is how the math that SAS uses works when you have a categorical predictor variable. One level (in this case MEDS=NO) will have zero degrees of freedom and a zero estimate and no p-value. When you have two levels of a categorical variable, there is only 1 degree of freedom for this categorical variable and you can only estimate one model coefficient. I wrote a simple explanation here.

--
Paige Miller
PSB
Fluorite | Level 6 PSB
Fluorite | Level 6

Hi @PaigeMiller 

 

Thanks much for your reply. Your linked post offered a nice explanation for using LSMeans and the math that SAS uses. I have a few follow-up questions (apologies, I haven't used LSMeans before).

- Based upon your post, it sounds like you recommend using the LSMEANS command for each model to get a better sense of the estimates for categorical variables?

- The LSMeans command I used (LSMeans Meds / DIFF = ALL) yielded the following output along with a figure:

PSB_0-1681842154444.png

Am I correct in interpreting that both med conditions (yes and no) were both predictive of the ZINB logit model? The difference in the estimates between these two conditions looks to be significant.

 

Sorry for the elementary questions and thanks in advance.

PaigeMiller
Diamond | Level 26

The first table indicates if the coefficient estimate is different than zero. Both are different than zero when Pr > |z| is less than 0.05 (with 95% confidence). Had the Pr > |z| been > 0.05, then the coefficient estimate is not statistically different than zero.

 

However, it is also true that the difference between the two coefficients is statistically different from zero. In this case, I would ignore the above paragraph and use both coefficients in the model.

--
Paige Miller
SteveDenham
Jade | Level 19

As far as the estimates in the model go, you are in a great position because you have a single, binary classification variable. Try adding the NOINT option to your MODEL statement, and see if the values and standard errors are more aligned with what you expect.  Without knowing the ranges for the continuous variables I can't be sure, but I would expect that the non-medicated group had a much lower rate than the medicated group, and that will probably seem more apparent when you remove the intercept. The estimates for the two groups are then the expected values when time, x1, x2 and x3 are all equal to zero. In other words, level intercepts for medicated and non-medicated.

And since these are the MLE results, I don't believe the zero-inflation probability has been factored in yet.

 

SteveDenham.

StatDave
SAS Super FREQ

As indicated in this note, the LSMEANS statement should not be used to estimate a mean count or rate in zero-inflated models. The issue shown in your parameter estimates table suggests that the logistic model for the extra zeros portion of the zero-inflated model cannot be well estimated. Specifically, the 0 df and standard error for the MEDS=YES parameter suggests that it is infinite. You might want to consider dropping MEDS from the ZEROMODEL statement. Note that the only effects that need to be in the ZEROMODEL statement are effects that are associated with the occurrence of extra zeros (beyond what the negative binomial model allows). This is usually a subset of the variables affecting the mean as specified in the MODEL statement, and in fact could include one or more variables that don't affect the mean and are therefore not specified in the MODEL statement. To avoid estimation problems like you show, you might want to start with no ZEROMODEL effects and then add variables in that statement to find just the variables that are significant and can be properly estimated. Note that zero-inflated models can also be fit using PROC FMM and, in SAS/ETS, PROC COUNTREG.

SteveDenham
Jade | Level 19

@StatDave - that note is gold. We have been trying a lot of methods to accomplish just a bit of this, and it has been a rough go. Super thanks for pointing it out.

 

SteveDenham

PSB
Fluorite | Level 6 PSB
Fluorite | Level 6

Hello all,


Thanks much for your feedback (and apologies on my delayed response). Everyone provided great suggestions. @StatDave your insight re: LSMeans is much appreciated. I followed your suggestions and removed all variables from the ZEROMODEL command and then added each one in, similar to a step-wise approach. 

 

Interestingly, the model converged and produced appropriate parameter estimates for ALL variables until I added in my final variable of X3 (which happens to be my main variable of interest). Running ZEROMODEL with X3 alone also led to convergence issues. That's odd, since that hasn't happened across any other model, so there must be an issue between the zeromodel for this specific outcome and X3 (which is a measure of affect on a 0-3 scale; we're using the total score, ranging from 27-65, M = 48). 

 

If anyone has thoughts about that, that'd be great. Either way, I appreciate everyone's help! 

StatDave
SAS Super FREQ

By default, the ZEROMODEL is a logistic model. So, if your X3 variable completely separates the zero count observations, essentially making it a perfect predictor, then just like in an ordinary logistic model its parameter estimate is infinite which obviously prevents convergence. For more on the general problem of separation in logistic models, see the "Details/Existence of Maximum Likelihood Estimates" section in the PROC LOGISTIC documentation. More data might render that predictor at least slightly imperfect making convergence possible. Unfortunately, for zero-inflated models, alternative estimation methods like exact estimation or the Firth penalized likelihood method are not available like they are for ordinary logistic models.

SteveDenham
Jade | Level 19

If you are using the total score, it is almost certain that for one of the values between 27 and 65 there is complete separation if you consider it as a categorical variable by including it in your CLASS statement. Since it is of primary consideration as a predictor, why not use it as a continuous variable. With 38 levels, it is a fairly good approximation to a continuous variable. So long as you can interpret the meaning of the model coefficient, you ought to be in good shape.

 

You could even look at polynomial or splined effects of the variable through use of the EFFECT statement. Or look at LSMEANS at various levels of X3 through the use of the AT option.

 

SteveDenham. 

PSB
Fluorite | Level 6 PSB
Fluorite | Level 6
Hi all,

Apologies for the delay. I spent more time looking at my data and moved to GLIMMIX, as my ZINB models did not account for the longitudinal correlations between observations. Thanks much!

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 10 replies
  • 1653 views
  • 6 likes
  • 4 in conversation