Thank you, Koen, for your detailed reply!
@sbxkoenk wrote:
"Zero-inflated" response and "limited" response are not the same.
Are you dealing with count data or continuous data (0 and above)?
The nomenclature "limited dependent variable", albeit strange and not that intuitive to me, comes from the econometric literature. For instance, I have retrieved two books (Limited-Dependent and Qualitative Variables in Econometrics (cambridge.org) and Analysis of Panels and Limited Dependent Variable Models (cambridge.org)), skimmed through their contents and found that the so-called "limited dependent variables" are in fact sample-selected variables. On many occassions, they are zero-inflated ones as well. I averted using the term "zero-inflated" in the title to avoid people who are attracted by this pharse, come straight into this post and tell me to use PROC GENMOD to tackle this problem because PROC GENMOD can model zero-inflated count data, without noticing the fact that the variable I wish to model is continuous rather than discrete, an issue whose solution has been given in SAS Help as well as many other literature.
As I have said in the post and the paragraph above, the variable I wish to model is continuous.
@sbxkoenk wrote:
Are you talking about zero-inflated models (gamma, lognormal, Poisson, negative binomial) or are you talking about (Gaussian) mixture models?
I am not sure the exact definition of a "lognormal zero-inflated model" as I have not yet seen this phrase in the literature on zero-inflated models that I have come across. If this model refers to a model that is capable of modeling a variable that is zero-inflated in nature and whose non-zero part follows a lognormal distribution, then that is the model I wish to build.
By the way, I am not sure about the definition of (Gaussian) mixture models. Do you mean finite mixture models whose dependent variables follow a finite mixture of (normal) distributions that can be built by PROC FMM? To the best of my knowledge, these models are not typically classified as zero-inflated models. Can they model zero-inflated data as well?
@sbxkoenk wrote:
I think your 2-parts are :
a logistic regression to P(y=0) and
a gamma (or log-normal) error regression with log link to E(y | y>0))
I do not think that if the link function is the natural log of the expectation of the dependent variable given that it has exceeded zero (lnE(y|y>0)), the errors would still be log-normal. But that is a trivial issue. Aside from that, what you have outlined is exactly what I want.
@sbxkoenk wrote:
With PROC NLMIXED you can maximize the (log-)likelihood jointly. However, it could be (will be) the likelihood separates anyway, so you don't get improved parameter estimates as a result (only advantage then is that you do it with one function call and you estimate the combined function E(y) which includes the zeroes).
Thank you for your reminder! I have been reading a monograph (Regression Models: Censored, Sample Selected, or Truncated Data (Quantitative Applications in the Social Sciences): Breen, Richard: 9780803957107: Amazon.com: Books) on zero-inflated continuous data, which has also been termed as "sample selected data". When elaborating the way to model a zero-inflated variable whose non-zero part follows a normal distribution, the author demonstrated the inappropriateness of not maximizing the joint likelihood in that ordinary least squares estimators of the regression coefficients conducted on the non-zero portion, on the entire sample are all biased and (or) inconsistent, except in rare conditions that is, in my opinion, hard to verify with neither the data at hand nor professional knowledge. On the contrary, the estimator of the regression coefficients obtained by maximizing the joint likelihood is guaranteed to be unbiased and consistent. So I think it a safer choice.
@sbxkoenk wrote:
Note that you can use PROC NLMIXED in the absence of random effects (only fixed effects is fine here).
There are other ways in SAS to maximize (any tailored) likelihood beyond PROC NLMIXED.
Koen
Thank you for pointing out the fact that PROC NLMIXED can be of help! Could you please provide some details on the other ways of maximizing tailored likelihood functions that you mentioned?
Thanks!
... View more