turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- Analytics
- /
- Stat Procs
- /
- Model selection when dependent variables consists ...

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

12-18-2013 05:00 PM

Hi,

I am dealing with this problem where my dependent variable is continuous but consisted of several zeros (about 25%). The purpose of my study is out of sample prediction so I would expect several predicted values to be zeros as well. I understand that I cannot use count model since my dependent variable is continuous. OLS is a possibility ,but in this case OLS is giving low predictions but hardly any which can be considered zero. I tried GLM too with tweedie distribution nad link=log, this also gives no predictions close to zeros as I would expect. However, I ran a tobit model with lower bound censored at zero, and it gave me a mean value which is very close to the observed mean value. Tobit also generated zero predictions, but it predicted zeros for about 68% cases, which is very high.

Next, I am going to estimate a hurdle regression but I would appreciate any suggestions for an alternative model that might be better suited.

Thanks in advance.

-CD

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to cd2011

12-19-2013 09:33 AM

If you haven't investigated PROC FMM (finite mixture models), you might want to look at that, especially the examples. In particular, the prescreening of the data with PROC KDE might open up some other ideas.

Steve Denham

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to SteveDenham

12-19-2013 09:35 AM

Thanks. I will look into that.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to cd2011

12-27-2013 03:25 PM

Hi Steve,

As per your suggestion, I have been experimenting with Proc FMM. I looked through the 130-page SAS document on FMM procedure and few other document, but I am still confused about few things. Most of the examples out there are on count data. As I have mentioned earlier, the response variable in my data is continuous but has several zeros. I think what I am trying to do is, mixing distribution logit (for zero and not zero part) and lognormal (for the positive part). This is what I am doing:

(For the second model statement I tried both dist=constant and dist=binary. With binary I don't get any zero predictions which I would normally expect. Not sure if I am doing this part wrong or the prediction part wrong. )

proc fmm data= datafile ;

model x =y1 y2 y3/noint dist=lognormal;

model x= /dist=constant;

probmodel y1 y2 y3 ;

output out=fmm predicted residual;

Thank you very much.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to cd2011

12-30-2013 01:04 PM

I hadn't even considered the dist=constant--that's clever, and it makes it look more like a hurdle model, which would fit the process better, I think.

Steve Denham