I'm trying ZTNB for the first time (SAS EG 8.2) to model cumulative days' supply of prescription meds based on individual-level characteristics (e.g. binary long-term care, sex, age group), where all subjects have at least one day's supply, ie, no zeroes. I found the following code from:
https://stats.oarc.ucla.edu/sas/dae/zero-truncated-negative-binomial/
https://jbhender.github.io/Stats506/F18/GP/Group6.html
proc nlmixed data=ZTNB;
log_mu = intercept + b_long_term_care*long_term_care + b_female*female + b_agegr1*agegr1;
mu = exp(log_mu);
het = 1/alpha;
ll = lgamma(CumDays+het) - lgamma(CumDays + 1) - lgamma(het) - het*log(1+alpha*mu)
+ CumDays*log(alpha*mu) - CumDays*log(1+alpha*mu) - log(1 - (1 + alpha * mu)**-het);
model CumDays ~ general(ll);
run;
Running this code, I get the following in the log:
NOTE: To assign starting values to parameters, use the PARMS statement. The default starting value of 1.0 is in effect for all parameters.
ERROR: QUANEW Optimization cannot be completed.
NOTE: Optimization routine cannot improve the function value.
NOTE:The SAS System stopped processing this step because of errors.
Any guidance on how to run this procedure will be appreciated.
Good luck with your analysis,
Koen
Thank you for taking the time to reply. I did try a Google search and obtained similar results to yours, but nothing specific enough to my issue (combined with my lack of experience using this procedure). I got a reply to another thread I posted, advising me to use similar but not identical code to my initial attempt, referencing this note https://support.sas.com/kb/43/522.html:
proc nlmixed data=ZTNB;
y=CumDays;
mean = exp(intercept + b_long_term_care*long_term_care + b_female*female + b_agegr1*agegr1);
ll = lgamma(y+1/k) - lgamma(1/k) - lgamma(y+1) + y*log(k*mean) -
(y+(1/k))*log(1+k*mean) - log(1-(1+k*mean)**(-1/k));
model y ~ general(ll);
run;
or using proc FMM:
proc fmm data=ZTNB;
model y = long_term_care female agegr1 / dist=truncnegbin;
run;
Why don't you try the zero-inflated negative binomial model? After all, zero-truncation is one of the causes of zero-inflation.
As the log you provided suggested, starting values are needed in the NLMIXED procedure. From my perspective, the provision of starting values itself is subjective. That is the reason why I suggest resorting to building the zero-inflated negative binomial model in the GENMOD procedure.
Another reminder I would like to give regards the selection of model itself. There are a number of zero-inflated models. Please check the requirements (e.g., assumptions) of the zero-inflated negative binomial models to see if the model itself is appropriate. Given the limited amount of information you provided, I cannot see anything suggesting the violation of the requirements of the zero-inflated negative binomial model.
Hello @Season ,
Are you sure zero-truncation can be considered a special case of zero-inflated?
Zero-inflated models can also deal with processes where the zeros are hyper-deflated (instead of hyper-inflated) ... but can they also deal with processes where the zeros are removed (whether you took them off or because the zero cases are just not observed for whatever reason)? I don't think so.
In any case, here's some further info on zero-inflated models in SAS
(taken from 30333 - FASTats: Frequently Asked-For Statistics (sas.com)) :
Zero-inflated models
The most common are zero-inflated Poisson (ZIP) and zero-inflated negative binomial (ZINB) models. Zero-inflated models are often used to account for overdispersion (SAS Note 22630). ZIP and ZINB models are available in SAS/STAT PROC GENMOD. SAS/ETS PROC COUNTREG and SAS Viya PROC CNTSELECT also fit ZIP and ZINB models. Beginning in SAS 9.3, SAS/STAT PROC FMM can add zero-inflation to any of the wide range of models it can fit, which includes ZIP and ZINB models. Beginning in SAS 9.4, ZIP and ZINB models can be fit in SAS/STAT PROC HPGENSELECT and SAS/ETS PROC HPCOUNTREG. PROC HPGENSELECT allows for selection of effects in both the mean and zero-inflation parts of the model. GEE analysis of zero-inflated models is not available. However, zero-inflated models can also be fit using SAS/STAT PROC NLMIXED, which also allows for inclusion of random effects. See the example in SAS Note 44354 and the example here of zero-inflated count models, and SAS Note 52161 for an example of a zero-inflated binomial model.
Koen
@sbxkoenk wrote:
Hello @Season ,
Are you sure zero-truncation can be considered a special case of zero-inflated?
Yes, I think so. In fact, according to Breen's book, the concept of "truncation" applies to a set of sample instead of a variable. As Breen notes, "truncation" refers to the duo of (1) the dependent variable y is observed only if some criterion defined in terms of the value of y is met, such as y >c, where c is a constant; (2) Explanatory variables are observed only if y is observed.
If we forget about the explanatory variables and only focus on the dependent variable, it is easy to see that were the definition of truncation met and the criterion is y≥0, then y would be zero-inflated.
However, truncation is not the only cause of zero-inflation. Were the sample censored or sample-selected, zero-inflation might also be incurred. Please refer to Breen's book for a more detailed definition of the trio (truncation, sample-selection and censoring), which is summarized in Table 1.1 of that book.
From my reading, zero-inflated and zero-truncated data are not the same phenomena. In my case, all my subjects had at least one prescription, so our outcome of interest, days' supply of medication prescribed, has no zero values.
@amyip wrote:
From my reading, zero-inflated and zero-truncated data are not the same phenomena. In my case, all my subjects had at least one prescription, so our outcome of interest, days' supply of medication prescribed, has no zero values.
I think if you slightly modify the concept of truncation proposed by Breen into y≥c instead of sticking to the original definition of y>c and choose zero as the constant c, then zero-truncation can be a source of zero-inflation. They are not identical entities, the former can be one of the source of the latter. This subtle difference might be more a philosophical than a statistical issue.
On the other hand, if you wish to strictly apply Breen's definition, then zero-truncation is not a cause of zero-inflation, because only the y's that are strictly larger than zero are recorded in the data, which is your case.
Catch the best of SAS Innovate 2025 — anytime, anywhere. Stream powerful keynotes, real-world demos, and game-changing insights from the world’s leading data and AI minds.
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.