<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Model selection when dependent variables consists of zeros in Statistical Procedures</title>
    <link>https://communities.sas.com/t5/Statistical-Procedures/Model-selection-when-dependent-variables-consists-of-zeros/m-p/177109#M9202</link>
    <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;I hadn't even considered the dist=constant--that's clever, and it makes it look more like a hurdle model, which would fit the process better, I think.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Steve Denham&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
    <pubDate>Mon, 30 Dec 2013 18:04:09 GMT</pubDate>
    <dc:creator>SteveDenham</dc:creator>
    <dc:date>2013-12-30T18:04:09Z</dc:date>
    <item>
      <title>Model selection when dependent variables consists of zeros</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Model-selection-when-dependent-variables-consists-of-zeros/m-p/177105#M9198</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I am dealing with this problem where my dependent variable is continuous but consisted of several zeros (about 25%). The purpose of my study is out of sample prediction so I would expect several predicted values to be zeros as well. I understand that I cannot use count model since my dependent variable is continuous. OLS is a possibility ,but in this case OLS is giving low predictions but hardly any which can be considered zero. I tried GLM too with tweedie distribution nad link=log, this also gives no predictions close to zeros as I would expect. However, I ran a tobit model with lower bound censored at zero, and it gave me a mean value which is very close to the observed mean value. Tobit also generated zero predictions, but it predicted zeros for about 68% cases, which is very high.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Next, I am going to estimate a hurdle regression but I would appreciate any suggestions for an alternative model that might be better suited.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thanks in advance.&lt;/P&gt;&lt;P&gt; -CD&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Wed, 18 Dec 2013 22:00:44 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Model-selection-when-dependent-variables-consists-of-zeros/m-p/177105#M9198</guid>
      <dc:creator>cd2011</dc:creator>
      <dc:date>2013-12-18T22:00:44Z</dc:date>
    </item>
    <item>
      <title>Re: Model selection when dependent variables consists of zeros</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Model-selection-when-dependent-variables-consists-of-zeros/m-p/177106#M9199</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;If you haven't investigated PROC FMM (finite mixture models), you might want to look at that, especially the examples.&amp;nbsp; In particular, the prescreening of the data with PROC KDE might open up some other ideas.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Steve Denham&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Thu, 19 Dec 2013 14:33:43 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Model-selection-when-dependent-variables-consists-of-zeros/m-p/177106#M9199</guid>
      <dc:creator>SteveDenham</dc:creator>
      <dc:date>2013-12-19T14:33:43Z</dc:date>
    </item>
    <item>
      <title>Re: Model selection when dependent variables consists of zeros</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Model-selection-when-dependent-variables-consists-of-zeros/m-p/177107#M9200</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Thanks. I will look into that.&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Thu, 19 Dec 2013 14:35:23 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Model-selection-when-dependent-variables-consists-of-zeros/m-p/177107#M9200</guid>
      <dc:creator>cd2011</dc:creator>
      <dc:date>2013-12-19T14:35:23Z</dc:date>
    </item>
    <item>
      <title>Re: Model selection when dependent variables consists of zeros</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Model-selection-when-dependent-variables-consists-of-zeros/m-p/177108#M9201</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Hi Steve,&lt;/P&gt;&lt;P&gt;As per your suggestion, I have been experimenting with Proc FMM. I looked through the 130-page SAS document on FMM procedure and few other document, but I am still confused about few things. Most of the examples out there are on count data. As I have mentioned earlier, the response variable in my data is continuous but has several zeros. I think what I am trying to do is, mixing distribution logit (for zero and not zero part) and lognormal (for the positive part). This is what I am doing:&lt;/P&gt;&lt;P&gt;(For the second model statement I tried both dist=constant and dist=binary. With binary I don't get any zero predictions which I would normally expect. Not sure if I am doing this part wrong or the prediction part wrong. )&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;proc fmm data= datafile ;&lt;/P&gt;&lt;P&gt;model x =y1 y2 y3/noint dist=lognormal;&lt;/P&gt;&lt;P&gt;model x= /dist=constant;&lt;/P&gt;&lt;P&gt;probmodel y1 y2 y3 ;&lt;/P&gt;&lt;P&gt;output out=fmm predicted residual;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thank you very much.&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Fri, 27 Dec 2013 20:25:57 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Model-selection-when-dependent-variables-consists-of-zeros/m-p/177108#M9201</guid>
      <dc:creator>cd2011</dc:creator>
      <dc:date>2013-12-27T20:25:57Z</dc:date>
    </item>
    <item>
      <title>Re: Model selection when dependent variables consists of zeros</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Model-selection-when-dependent-variables-consists-of-zeros/m-p/177109#M9202</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;I hadn't even considered the dist=constant--that's clever, and it makes it look more like a hurdle model, which would fit the process better, I think.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Steve Denham&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Mon, 30 Dec 2013 18:04:09 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Model-selection-when-dependent-variables-consists-of-zeros/m-p/177109#M9202</guid>
      <dc:creator>SteveDenham</dc:creator>
      <dc:date>2013-12-30T18:04:09Z</dc:date>
    </item>
  </channel>
</rss>

