BookmarkSubscribeRSS Feed
cd2011
Calcite | Level 5

Hi,

I am dealing with this problem where my dependent variable is continuous but consisted of several zeros (about 25%). The purpose of my study is out of sample prediction so I would expect several predicted values to be zeros as well. I understand that I cannot use count model since my dependent variable is continuous. OLS is a possibility ,but in this case OLS is giving low predictions but hardly any which can be considered zero. I tried GLM too with tweedie distribution nad link=log, this also gives no predictions close to zeros as I would expect. However, I ran a tobit model with lower bound censored at zero, and it gave me a mean value which is very close to the observed mean value. Tobit also generated zero predictions, but it predicted zeros for about 68% cases, which is very high.

Next, I am going to estimate a hurdle regression but I would appreciate any suggestions for an alternative model that might be better suited.

Thanks in advance.

-CD

4 REPLIES 4
SteveDenham
Jade | Level 19

If you haven't investigated PROC FMM (finite mixture models), you might want to look at that, especially the examples.  In particular, the prescreening of the data with PROC KDE might open up some other ideas.

Steve Denham

cd2011
Calcite | Level 5

Thanks. I will look into that.

cd2011
Calcite | Level 5

Hi Steve,

As per your suggestion, I have been experimenting with Proc FMM. I looked through the 130-page SAS document on FMM procedure and few other document, but I am still confused about few things. Most of the examples out there are on count data. As I have mentioned earlier, the response variable in my data is continuous but has several zeros. I think what I am trying to do is, mixing distribution logit (for zero and not zero part) and lognormal (for the positive part). This is what I am doing:

(For the second model statement I tried both dist=constant and dist=binary. With binary I don't get any zero predictions which I would normally expect. Not sure if I am doing this part wrong or the prediction part wrong. )

proc fmm data= datafile ;

model x =y1 y2 y3/noint dist=lognormal;

model x= /dist=constant;

probmodel y1 y2 y3 ;

output out=fmm predicted residual;

Thank you very much.

SteveDenham
Jade | Level 19

I hadn't even considered the dist=constant--that's clever, and it makes it look more like a hurdle model, which would fit the process better, I think.

Steve Denham

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 2334 views
  • 0 likes
  • 2 in conversation