Hi,
I am dealing with this problem where my dependent variable is continuous but consisted of several zeros (about 25%). The purpose of my study is out of sample prediction so I would expect several predicted values to be zeros as well. I understand that I cannot use count model since my dependent variable is continuous. OLS is a possibility ,but in this case OLS is giving low predictions but hardly any which can be considered zero. I tried GLM too with tweedie distribution nad link=log, this also gives no predictions close to zeros as I would expect. However, I ran a tobit model with lower bound censored at zero, and it gave me a mean value which is very close to the observed mean value. Tobit also generated zero predictions, but it predicted zeros for about 68% cases, which is very high.
Next, I am going to estimate a hurdle regression but I would appreciate any suggestions for an alternative model that might be better suited.
Thanks in advance.
-CD
If you haven't investigated PROC FMM (finite mixture models), you might want to look at that, especially the examples. In particular, the prescreening of the data with PROC KDE might open up some other ideas.
Steve Denham
Thanks. I will look into that.
Hi Steve,
As per your suggestion, I have been experimenting with Proc FMM. I looked through the 130-page SAS document on FMM procedure and few other document, but I am still confused about few things. Most of the examples out there are on count data. As I have mentioned earlier, the response variable in my data is continuous but has several zeros. I think what I am trying to do is, mixing distribution logit (for zero and not zero part) and lognormal (for the positive part). This is what I am doing:
(For the second model statement I tried both dist=constant and dist=binary. With binary I don't get any zero predictions which I would normally expect. Not sure if I am doing this part wrong or the prediction part wrong. )
proc fmm data= datafile ;
model x =y1 y2 y3/noint dist=lognormal;
model x= /dist=constant;
probmodel y1 y2 y3 ;
output out=fmm predicted residual;
Thank you very much.
I hadn't even considered the dist=constant--that's clever, and it makes it look more like a hurdle model, which would fit the process better, I think.
Steve Denham
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.