BookmarkSubscribeRSS Feed
cd2011
Calcite | Level 5

Hi,

I am dealing with this problem where my dependent variable is continuous but consisted of several zeros (about 25%). The purpose of my study is out of sample prediction so I would expect several predicted values to be zeros as well. I understand that I cannot use count model since my dependent variable is continuous. OLS is a possibility ,but in this case OLS is giving low predictions but hardly any which can be considered zero. I tried GLM too with tweedie distribution nad link=log, this also gives no predictions close to zeros as I would expect. However, I ran a tobit model with lower bound censored at zero, and it gave me a mean value which is very close to the observed mean value. Tobit also generated zero predictions, but it predicted zeros for about 68% cases, which is very high.

Next, I am going to estimate a hurdle regression but I would appreciate any suggestions for an alternative model that might be better suited.

Thanks in advance.

-CD

4 REPLIES 4
SteveDenham
Jade | Level 19

If you haven't investigated PROC FMM (finite mixture models), you might want to look at that, especially the examples.  In particular, the prescreening of the data with PROC KDE might open up some other ideas.

Steve Denham

cd2011
Calcite | Level 5

Thanks. I will look into that.

cd2011
Calcite | Level 5

Hi Steve,

As per your suggestion, I have been experimenting with Proc FMM. I looked through the 130-page SAS document on FMM procedure and few other document, but I am still confused about few things. Most of the examples out there are on count data. As I have mentioned earlier, the response variable in my data is continuous but has several zeros. I think what I am trying to do is, mixing distribution logit (for zero and not zero part) and lognormal (for the positive part). This is what I am doing:

(For the second model statement I tried both dist=constant and dist=binary. With binary I don't get any zero predictions which I would normally expect. Not sure if I am doing this part wrong or the prediction part wrong. )

proc fmm data= datafile ;

model x =y1 y2 y3/noint dist=lognormal;

model x= /dist=constant;

probmodel y1 y2 y3 ;

output out=fmm predicted residual;

Thank you very much.

SteveDenham
Jade | Level 19

I hadn't even considered the dist=constant--that's clever, and it makes it look more like a hurdle model, which would fit the process better, I think.

Steve Denham

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 2104 views
  • 0 likes
  • 2 in conversation