<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic How should I simulate Zero inflated data with extra conditions in SAS Procedures</title>
    <link>https://communities.sas.com/t5/SAS-Procedures/How-should-I-simulate-Zero-inflated-data-with-extra-conditions/m-p/447361#M69518</link>
    <description>&lt;P&gt;I want to simulate Zero inflated Poisson data sets 1000 times. I am using predicted coefficients and independent variables that are set as in the code below. But I have to include more conditions;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;1. I want 70% of the resulting counts(the simulated y) in the interval [5,14] be rounded of to 10 for each data set.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; The thing is that the first code below simulates each observation 1000 times, then i'm using the second code to order the data so&amp;nbsp;&amp;nbsp; that I have 1000 data sets. So the rounding is required for each data set.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;%let NumSamples = 1000;  /* number of samples */
data Sim_data(drop=mu);
set WORK.xxx;
call streaminit(1234);
do SampleID=1 to &amp;amp;NumSamples; 
ObsNum = _N_; 
mu = exp(0.6773 - 0.01036*Age + 0.2080*height);
ypoi = ranpoi(1234,mu);
pzero = cdf("LOGISTIC",-0.9792 +0.03796*Age -0.1488*height);
if ranuni(1234)&amp;gt;pzero then do;
ypoizim = ypoi;
end;
else do;
ypoizim = 0;
end;
y=ypoizim;
output;
end;
run;&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;proc sort data=Sim_data; 
by SampleID ObsNum;
run;&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;2. Again the same thing, rounding of 20% of resulting counts in the interval [25,35] to 30&lt;/P&gt;</description>
    <pubDate>Wed, 21 Mar 2018 10:23:59 GMT</pubDate>
    <dc:creator>john1111</dc:creator>
    <dc:date>2018-03-21T10:23:59Z</dc:date>
    <item>
      <title>How should I simulate Zero inflated data with extra conditions</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/How-should-I-simulate-Zero-inflated-data-with-extra-conditions/m-p/447361#M69518</link>
      <description>&lt;P&gt;I want to simulate Zero inflated Poisson data sets 1000 times. I am using predicted coefficients and independent variables that are set as in the code below. But I have to include more conditions;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;1. I want 70% of the resulting counts(the simulated y) in the interval [5,14] be rounded of to 10 for each data set.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; The thing is that the first code below simulates each observation 1000 times, then i'm using the second code to order the data so&amp;nbsp;&amp;nbsp; that I have 1000 data sets. So the rounding is required for each data set.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;%let NumSamples = 1000;  /* number of samples */
data Sim_data(drop=mu);
set WORK.xxx;
call streaminit(1234);
do SampleID=1 to &amp;amp;NumSamples; 
ObsNum = _N_; 
mu = exp(0.6773 - 0.01036*Age + 0.2080*height);
ypoi = ranpoi(1234,mu);
pzero = cdf("LOGISTIC",-0.9792 +0.03796*Age -0.1488*height);
if ranuni(1234)&amp;gt;pzero then do;
ypoizim = ypoi;
end;
else do;
ypoizim = 0;
end;
y=ypoizim;
output;
end;
run;&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;proc sort data=Sim_data; 
by SampleID ObsNum;
run;&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;2. Again the same thing, rounding of 20% of resulting counts in the interval [25,35] to 30&lt;/P&gt;</description>
      <pubDate>Wed, 21 Mar 2018 10:23:59 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/How-should-I-simulate-Zero-inflated-data-with-extra-conditions/m-p/447361#M69518</guid>
      <dc:creator>john1111</dc:creator>
      <dc:date>2018-03-21T10:23:59Z</dc:date>
    </item>
    <item>
      <title>Re: How should I simulate Zero inflated data with extra conditions</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/How-should-I-simulate-Zero-inflated-data-with-extra-conditions/m-p/447389#M69520</link>
      <description>&lt;P&gt;Calling&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/13684"&gt;@Rick_SAS&lt;/a&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 21 Mar 2018 12:27:29 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/How-should-I-simulate-Zero-inflated-data-with-extra-conditions/m-p/447389#M69520</guid>
      <dc:creator>Ksharp</dc:creator>
      <dc:date>2018-03-21T12:27:29Z</dc:date>
    </item>
    <item>
      <title>Re: How should I simulate Zero inflated data with extra conditions</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/How-should-I-simulate-Zero-inflated-data-with-extra-conditions/m-p/447399#M69522</link>
      <description>&lt;P&gt;Probably you want something similar to this:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;%let NumSamples = 5;  /* number of samples */
data Sim_data;
set WORK.xxx;
call streaminit(1234);
do SampleID=1 to &amp;amp;NumSamples; 
   ObsNum = _N_; 
   mu = exp(0.6773 - 0.01036*Age + 0.2080*height);
   ypoi = rand("poisson", mu);
   pzero = cdf("LOGISTIC",-0.9792 +0.03796*Age -0.1488*height);
   if rand("Bernoulli", pzero) then 
      ypoizim = ypoi;
   else
      ypoizim = 0;
   y = ypoizim;

   /* I want 70% of the resulting counts(the simulated y) in the interval [5,14] 
      to be rounded of to 10 for each data set */
   if (5 &amp;lt;= y &amp;lt;= 14) &amp;amp; rand("Bernoulli", 0.7) then
      y10 = round(y, 10);  /* round to nearest 10 */
   else 
      y10 = y;

   /* Again the same thing, rounding of 20% of resulting counts 
      in the interval [25,35] to 30 */
   if (25 &amp;lt;= y &amp;lt;= 30) &amp;amp; rand("Bernoulli", 0.2) then
      y30 = round(y, 10);  /* round to nearest 10 */
   else 
      y30 = y;
   output;
end;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Incidentally, you are setting the random number seed by using STREAMINIT, but then you are using the old-style random number functions instead of RAND. &lt;A href="https://blogs.sas.com/content/iml/2013/07/10/stop-using-ranuni.html" target="_self"&gt;If you are doing serious work, you should use RAND&lt;/A&gt;, which has better statistical properties. I updated the code to use RAND.&lt;/P&gt;</description>
      <pubDate>Wed, 21 Mar 2018 13:03:15 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/How-should-I-simulate-Zero-inflated-data-with-extra-conditions/m-p/447399#M69522</guid>
      <dc:creator>Rick_SAS</dc:creator>
      <dc:date>2018-03-21T13:03:15Z</dc:date>
    </item>
    <item>
      <title>Re: How should I simulate Zero inflated data with extra conditions</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/How-should-I-simulate-Zero-inflated-data-with-extra-conditions/m-p/447691#M69548</link>
      <description>&lt;P&gt;Thanks a lot for the solution. That is precisely what I was looking for. But I just want to confirm something. I am coming up with some heaped data model, then I am planning to use simulation to verify that the heaped model would perform better for parameter estimation. So I simulated my original count observations then rounded off a few observations to get data that is similar to the real and observed data. If I simulate the data like this is it statistically correct because I suspect that the 1000 different data sets may be so varied from the original data set that I may actually not be able to make real comparisons.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;My suspicion comes from the fact that after simulation I can't see the heaping for each data set and again the zeros are just too many (Extreme, I mean I expect it to be zero inflated but its more than that). In general my simulated data sets looses their property to the extent that I can't run the model on it.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;How can I ensure that the properties of my data are maintained. I mean, my original data had counts between 0 to 60. My new data has counts only between 0 to 13. My original data was heaped at 25 therefore obviously I won't see that because the simulated data has no 25. How can I simulate heaped data?&lt;/P&gt;</description>
      <pubDate>Thu, 22 Mar 2018 08:21:42 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/How-should-I-simulate-Zero-inflated-data-with-extra-conditions/m-p/447691#M69548</guid>
      <dc:creator>john1111</dc:creator>
      <dc:date>2018-03-22T08:21:42Z</dc:date>
    </item>
    <item>
      <title>Re: How should I simulate Zero inflated data with extra conditions</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/How-should-I-simulate-Zero-inflated-data-with-extra-conditions/m-p/447707#M69550</link>
      <description>&lt;P&gt;Could you explain what you mean by "a heaped model" and "heaped at 25"? Or post data, if possible.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;The key problem in constructing a simulation study is how to simulate the data. Most people do what you did: propose a MODEL of the data and simulate from the MODEL. Your results will match the data only if the model is a good fit.If the observed data has properties that the model does not (skewness, outliers, etc), then the results of the study might not capture those properties.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;If you can't find a good parametric model for the data, you can use a flexible family of distributions (see Chapter 16 of &lt;EM&gt;SImulating Data with SAS&lt;/EM&gt;) or use the bootstrap, which is a nonparametric model (see Chapter 15). For more about ways to simulate data, see the diagram and discussion at &lt;A href="https://blogs.sas.com/content/iml/2018/01/15/eyeball-distribution.html" target="_self"&gt;https://blogs.sas.com/content/iml/2018/01/15/eyeball-distribution.html&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 22 Mar 2018 10:05:26 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/How-should-I-simulate-Zero-inflated-data-with-extra-conditions/m-p/447707#M69550</guid>
      <dc:creator>Rick_SAS</dc:creator>
      <dc:date>2018-03-22T10:05:26Z</dc:date>
    </item>
    <item>
      <title>Re: How should I simulate Zero inflated data with extra conditions</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/How-should-I-simulate-Zero-inflated-data-with-extra-conditions/m-p/447719#M69553</link>
      <description>&lt;P&gt;The data is as attached then I'm calling it heaped at point 5,10,12,15,20,15.&lt;/P&gt;&lt;P&gt;Because of the frequency of these numbers as shown in the following plot.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I have proposed a model and I want to use simulation to very either;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 1. The model gives estimates of counts, y that are very close to the observed y.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; OR&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 2. That the model provides better results based on standard error of the estimates.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="heaped_plot.PNG" style="width: 575px;"&gt;&lt;img src="https://communities.sas.com/t5/image/serverpage/image-id/19375iC52A855955974F95/image-size/large?v=v2&amp;amp;px=999" role="button" title="heaped_plot.PNG" alt="heaped_plot.PNG" /&gt;&lt;/span&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 22 Mar 2018 10:35:41 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/How-should-I-simulate-Zero-inflated-data-with-extra-conditions/m-p/447719#M69553</guid>
      <dc:creator>john1111</dc:creator>
      <dc:date>2018-03-22T10:35:41Z</dc:date>
    </item>
    <item>
      <title>Re: How should I simulate Zero inflated data with extra conditions</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/How-should-I-simulate-Zero-inflated-data-with-extra-conditions/m-p/447745#M69557</link>
      <description>&lt;P&gt;I see. So the response would be something like "how long since you visited&amp;nbsp;a foreign country. Many people will reply "5 years" or "10 years" because they are estimating something&amp;nbsp;that happened long ago.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I do not have any experience modeling data like these. I suggest you do a literature&amp;nbsp;review.&amp;nbsp; For example, &lt;A href="https://www.cbs.nl/-/media/imported/documents/2011/08/2011-x10-08.pdf" target="_self"&gt;the paper by&amp;nbsp;van der Laan and Kuijvenhoven (2011)&lt;/A&gt;&amp;nbsp;would be one way to treat this issue. A mixture model would be another.&amp;nbsp; Good luck!&lt;/P&gt;</description>
      <pubDate>Thu, 22 Mar 2018 12:42:57 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/How-should-I-simulate-Zero-inflated-data-with-extra-conditions/m-p/447745#M69557</guid>
      <dc:creator>Rick_SAS</dc:creator>
      <dc:date>2018-03-22T12:42:57Z</dc:date>
    </item>
  </channel>
</rss>

