<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: OLS model for large zero inflated dataset in Statistical Procedures</title>
    <link>https://communities.sas.com/t5/Statistical-Procedures/OLS-model-for-large-zero-inflated-dataset/m-p/591803#M28940</link>
    <description>Thanks for your help.See my reply to Dave.&lt;BR /&gt;Regards&lt;BR /&gt;</description>
    <pubDate>Thu, 26 Sep 2019 09:44:59 GMT</pubDate>
    <dc:creator>daltonchris7720</dc:creator>
    <dc:date>2019-09-26T09:44:59Z</dc:date>
    <item>
      <title>OLS model for large zero inflated dataset</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/OLS-model-for-large-zero-inflated-dataset/m-p/591344#M28922</link>
      <description>&lt;P&gt;Hello SAS experts&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;In a couple of earlier forums I have been discussing modelling out of pockets cost for members of a health plan. There are a large number of zeros in this very large dataset(about 80% of 5.5 million observations) and the non-zero out of pockets are highly right skewed(from a few dollars to $55,000).&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;So I've tried PROC HPFMM and PROC HPGENSLECT with a Tweedie ,Gamma&amp;nbsp;or ZINB distributions, although these&amp;nbsp;are not really count data. The zeros are not a latent class variable either, as they don't come from a different process to those members who get an out of pocket charge. So I don't think finite mixture models, hurdle models or zero inflated models really work here.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Theoretically the Tweedie distribution should work but interestingly just doing plain ol' OLS with PROC HPREG seems to work best with the lowest AIC (by a long way with all these models); looking at the raw residuals(attached) with the OLS&amp;nbsp;model shows the bimodal distribution quite well. Is this really valid though given the highly non-normal distribution of this outcome variable?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Modelling the out of pocket as binary with PROC HPLOGISTIC works well&amp;nbsp;but I was trying to get a model which predicts the actual out of pocket, hence the attempts with the continuous outcome.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thoughts appreciated.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Regards&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Chris&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 25 Sep 2019 01:23:12 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/OLS-model-for-large-zero-inflated-dataset/m-p/591344#M28922</guid>
      <dc:creator>daltonchris7720</dc:creator>
      <dc:date>2019-09-25T01:23:12Z</dc:date>
    </item>
    <item>
      <title>Re: OLS model for large zero inflated dataset</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/OLS-model-for-large-zero-inflated-dataset/m-p/591379#M28923</link>
      <description>&lt;P&gt;I wouldn't want to send you on a wild chase but how about a finite mixture of constant(0) and exponential?&lt;/P&gt;</description>
      <pubDate>Wed, 25 Sep 2019 04:26:55 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/OLS-model-for-large-zero-inflated-dataset/m-p/591379#M28923</guid>
      <dc:creator>PGStats</dc:creator>
      <dc:date>2019-09-25T04:26:55Z</dc:date>
    </item>
    <item>
      <title>Re: OLS model for large zero inflated dataset</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/OLS-model-for-large-zero-inflated-dataset/m-p/591546#M28924</link>
      <description>&lt;P&gt;A zero-inflated gamma model, which can be done in PROC FMM, would also be a possibility. It would be appropriate for positive, right-skewed, continuous data with a point mass at zero.&lt;/P&gt;</description>
      <pubDate>Wed, 25 Sep 2019 14:24:36 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/OLS-model-for-large-zero-inflated-dataset/m-p/591546#M28924</guid>
      <dc:creator>StatDave</dc:creator>
      <dc:date>2019-09-25T14:24:36Z</dc:date>
    </item>
    <item>
      <title>Re: OLS model for large zero inflated dataset</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/OLS-model-for-large-zero-inflated-dataset/m-p/591802#M28939</link>
      <description>&lt;P&gt;Thanks both for your replies.&lt;/P&gt;&lt;P&gt;I tried the FMM models but got really weird predictions. Rick in an earlier post helped me with this.&lt;/P&gt;&lt;P&gt;He pointed out the following in the Details tab of the PROC HPFMM help information under the sub-tab "Log likelihood of the response distributions":&lt;/P&gt;&lt;P&gt;"While it is syntactically valid to mix a constant distribution with a continuous distribution, such as DIST=LOGNORMAL, such a mixture is not mathematically appropriate, because the constant log-likelihood is the log of a probability, while a continuous log-likelihood is the log of a probability density function. If you want to mix a constant distribution with a continuous distribution, you could model the constant as a very narrow continuous distribution, such as DIST=UNIFORM(c-delta,c+delta ) for a small value . However, using PROC HPFMM to analyze such mixtures is sensitive to numerical inaccuracy and ultimately unnecessary. Instead, the following approach is mathematically equivalent and more numerically stable:&lt;BR /&gt;Estimate the mixing probability as the proportion of observations in the data set such that |y_i - c|&amp;lt; epsilon.&lt;BR /&gt;Estimate the parameters of the continuous distribution from the observations for which |y_i - c|&amp;gt;=epsilon. "&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Sorry the equations won't copy. The link is&lt;/P&gt;&lt;P&gt;&lt;A href="http://support.sas.com/documentation/cdl/en/statug/68162/HTML/default/viewer.htm#statug_hpfmm_details07.htm" target="_blank"&gt;http://support.sas.com/documentation/cdl/en/statug/68162/HTML/default/viewer.htm#statug_hpfmm_details07.htm&lt;/A&gt;&lt;/P&gt;&lt;P&gt;I wasn't sure how to code this suggestion. Using the uniform as they suggested didn't work either.&lt;/P&gt;&lt;P&gt;My interpretation of FMM models is that they are 2 part models,one process for getting a zero and another for getting a positive outcome, which is not what is happening here.&lt;/P&gt;&lt;P&gt;Anyway I was really just wanting someone to say OLS with PROC HPREG is completely wrong(which is what I think but it seems to work),or you could use that but....&lt;/P&gt;&lt;P&gt;Regards&lt;/P&gt;&lt;P&gt;Chris&lt;/P&gt;</description>
      <pubDate>Thu, 26 Sep 2019 09:41:18 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/OLS-model-for-large-zero-inflated-dataset/m-p/591802#M28939</guid>
      <dc:creator>daltonchris7720</dc:creator>
      <dc:date>2019-09-26T09:41:18Z</dc:date>
    </item>
    <item>
      <title>Re: OLS model for large zero inflated dataset</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/OLS-model-for-large-zero-inflated-dataset/m-p/591803#M28940</link>
      <description>Thanks for your help.See my reply to Dave.&lt;BR /&gt;Regards&lt;BR /&gt;</description>
      <pubDate>Thu, 26 Sep 2019 09:44:59 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/OLS-model-for-large-zero-inflated-dataset/m-p/591803#M28940</guid>
      <dc:creator>daltonchris7720</dc:creator>
      <dc:date>2019-09-26T09:44:59Z</dc:date>
    </item>
  </channel>
</rss>

