<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Proc Surveyselect: If variable=1 always include this row in sample in Statistical Procedures</title>
    <link>https://communities.sas.com/t5/Statistical-Procedures/Proc-Surveyselect-If-variable-1-always-include-this-row-in/m-p/593423#M28985</link>
    <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/138205"&gt;@novinosrin&lt;/a&gt;: My understanding was that the OP had already determined sample sizes based on the strata (=bin) proportions in the &lt;FONT face="courier new,courier"&gt;application_data&lt;/FONT&gt;. The remaining task was just to &lt;EM&gt;prioritize&lt;/EM&gt; the flagged observations when sampling from the &lt;FONT face="courier new,courier"&gt;reference_data&lt;/FONT&gt;, but without exceeding the planned sample sizes per stratum.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;For example, if the planned sample size for a particular stratum in the&amp;nbsp;&lt;FONT face="courier new,courier"&gt;reference_data&lt;/FONT&gt; was 5 out of 20 (i.e.,&amp;nbsp;each item had a selection probability of 0.25 in the case of simple random sampling) and there were 2 items with Flag=1 in that stratum, the sample should be composed of: the 2 flagged items (selection probability 1) plus 3 out of the 18 items with Flag=0 (thus with a reduced selection probability of 1/6=0.1666...). If, however, 8 items in that stratum had Flag=1, only a random sample of 5 out of these 8 would be drawn (selection probability 0.625 -- always assuming simple random sampling) and none of the remaining 12 items with Flag=0 would be selected.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Does this answer your question?&lt;/P&gt;</description>
    <pubDate>Wed, 02 Oct 2019 15:16:49 GMT</pubDate>
    <dc:creator>FreelanceReinh</dc:creator>
    <dc:date>2019-10-02T15:16:49Z</dc:date>
    <item>
      <title>Proc Surveyselect: If variable=1 always include this row in sample</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Proc-Surveyselect-If-variable-1-always-include-this-row-in/m-p/593279#M28974</link>
      <description>&lt;P&gt;Hello everyone,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I have a problem that is rather specific and I cannot find any option to solve it:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I have two datasets:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;reference_data&lt;/LI&gt;&lt;LI&gt;application_data&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;with both having been binned by the same method. The respective bin for each row is stored in a column called "Bin". Furthermore I have a variable called "Flag" which can take on either 0 or 1, whereas the majority of data is set to 0 (since 1 is considered to be an outlier and thus rare but important).&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Example &lt;EM&gt;reference_data&lt;/EM&gt; set:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Var1 | Var2 | ... | Bin&amp;nbsp; &amp;nbsp; | Flag&lt;/P&gt;&lt;P&gt;________________________&lt;/P&gt;&lt;P&gt;5.12 | 015&amp;nbsp; | ...&amp;nbsp; | 0&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;|&amp;nbsp; 0&lt;/P&gt;&lt;P&gt;5.78 | 0.28 | ...&amp;nbsp; | 0&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;|&amp;nbsp; 0&lt;/P&gt;&lt;P&gt;3.45 | 0.91 | ...&amp;nbsp; | 1&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;|&amp;nbsp; 0&lt;/P&gt;&lt;P&gt;3.94 | 0.85 | ...&amp;nbsp; | 1&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;|&amp;nbsp; 1&lt;/P&gt;&lt;P&gt;8.23 | 0.17 | ...&amp;nbsp; | 2&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;|&amp;nbsp; 0&lt;/P&gt;&lt;P&gt;.....&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;From the &lt;EM&gt;application_data&lt;/EM&gt; set I can calculate the percentage of data within the respective bin, which I use to calculate "SAMPSIZE" for "PROC SURVEYSELECT" for the &lt;EM&gt;reference_data&lt;/EM&gt;, e.g:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;Bin 1 = 5.12%&lt;/LI&gt;&lt;LI&gt;Bin2 = 26.87%&lt;/LI&gt;&lt;LI&gt;...&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Using "SAMPSIZE", I now draw random samples from the &lt;EM&gt;reference_data&lt;/EM&gt; which means that the &lt;EM&gt;stratified_reference_data&lt;/EM&gt; composition will match the composition of &lt;EM&gt;application_data&lt;/EM&gt;. This gives me a perfectly stratified reference_data.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;However, I need to make sure that all rows in which "Flag = 1" are included in the &lt;EM&gt;stratified reference_data&lt;/EM&gt; set (in this regard random sampling has to be overridden), except when it would cause the stratified reference_data to have a different composition than application_data. (This would cause a bad Population Stability Index).&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I hope that I explained the problem understandably, otherwise let me know and I will try to clarify.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks in advance and best regards,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Weiler&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 02 Oct 2019 10:17:37 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Proc-Surveyselect-If-variable-1-always-include-this-row-in/m-p/593279#M28974</guid>
      <dc:creator>MWeiler</dc:creator>
      <dc:date>2019-10-02T10:17:37Z</dc:date>
    </item>
    <item>
      <title>Re: Proc Surveyselect: If variable=1 always include this row in sample</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Proc-Surveyselect-If-variable-1-always-include-this-row-in/m-p/593290#M28975</link>
      <description>&lt;P&gt;Because i have hardly any experience in statistics i need to see an example of what you have, what you get now and what you need. First thought was, to add the required obs after proc surveyselect to the result-dataset. Maybe (most likely) a bad idea.&lt;/P&gt;</description>
      <pubDate>Wed, 02 Oct 2019 10:47:12 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Proc-Surveyselect-If-variable-1-always-include-this-row-in/m-p/593290#M28975</guid>
      <dc:creator>andreas_lds</dc:creator>
      <dc:date>2019-10-02T10:47:12Z</dc:date>
    </item>
    <item>
      <title>Re: Proc Surveyselect: If variable=1 always include this row in sample</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Proc-Surveyselect-If-variable-1-always-include-this-row-in/m-p/593312#M28977</link>
      <description>&lt;P&gt;I believe you need the CERTSIZE option.&amp;nbsp; However cannot find an example and I haven't figured out how to use it.&amp;nbsp; This paper may be some help it mentions CERTSIZE and why one might want to select observations with certainty.&amp;nbsp; But does not provide an example that I can see.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;A href="https://www.mwsug.org/proceedings/2013/AA/MWSUG-2013-AA02.pdf" target="_blank"&gt;https://www.mwsug.org/proceedings/2013/AA/MWSUG-2013-AA02.pdf&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 02 Oct 2019 11:48:40 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Proc-Surveyselect-If-variable-1-always-include-this-row-in/m-p/593312#M28977</guid>
      <dc:creator>data_null__</dc:creator>
      <dc:date>2019-10-02T11:48:40Z</dc:date>
    </item>
    <item>
      <title>Re: Proc Surveyselect: If variable=1 always include this row in sample</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Proc-Surveyselect-If-variable-1-always-include-this-row-in/m-p/593316#M28978</link>
      <description>&lt;P&gt;Hello&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/293322"&gt;@MWeiler&lt;/a&gt;&amp;nbsp;and welcome to the SAS Support Communities!&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I assume that you use a dataset containing stratum (i.e. bin) sample sizes in the SAMPSIZE= option (let's call it &lt;FONT face="courier new,courier"&gt;size_data&lt;/FONT&gt;). You could create a subset &lt;FONT face="courier new,courier"&gt;size_data1&lt;/FONT&gt; of &lt;FONT face="courier new,courier"&gt;size_data&lt;/FONT&gt; containing only the bins with at least one observation with Flag=1 in the &lt;FONT face="courier new,courier"&gt;reference_data.&lt;/FONT&gt; Using &lt;FONT face="courier new,courier"&gt;size_data1&lt;/FONT&gt;&amp;nbsp;in the SAMPSIZE= option when sampling from&amp;nbsp;&lt;FONT face="courier new,courier"&gt;reference_data&lt;STRONG&gt;(where=(flag=1))&lt;/STRONG&gt;&lt;/FONT&gt; would yield the first part of the intended sample. You will need to use the &lt;A href="https://documentation.sas.com/?docsetId=statug&amp;amp;docsetTarget=statug_surveyselect_syntax01.htm&amp;amp;docsetVersion=14.3&amp;amp;locale=en#statug.surveyselect.selectselectall" target="_blank" rel="noopener"&gt;SELECTALL option&lt;/A&gt; in that PROC SURVEYSELECT step because most sample sizes will likely exceed the number of flagged observations.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;The second part of the sample would be drawn from&amp;nbsp;&lt;FONT face="courier new,courier"&gt;reference_data&lt;STRONG&gt;(where=(flag=0))&lt;/STRONG&gt;&lt;/FONT&gt; using a SAMPSIZE dataset, say &lt;FONT face="courier new,courier"&gt;size_data0&lt;/FONT&gt;, which contains for each bin the original sample size (from &lt;FONT face="courier new,courier"&gt;size_data&lt;/FONT&gt;) minus the frequency observed in the first part of the sample.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Does this make sense?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 02 Oct 2019 11:56:58 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Proc-Surveyselect-If-variable-1-always-include-this-row-in/m-p/593316#M28978</guid>
      <dc:creator>FreelanceReinh</dc:creator>
      <dc:date>2019-10-02T11:56:58Z</dc:date>
    </item>
    <item>
      <title>Re: Proc Surveyselect: If variable=1 always include this row in sample</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Proc-Surveyselect-If-variable-1-always-include-this-row-in/m-p/593362#M28981</link>
      <description>&lt;P&gt;Hello FreelanceRheinhard,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;yes this makes sense and is what I actually got working just now.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks for your feedback &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;However, I was wondering whether there is an in-built way of doing this?&lt;/P&gt;</description>
      <pubDate>Wed, 02 Oct 2019 13:23:50 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Proc-Surveyselect-If-variable-1-always-include-this-row-in/m-p/593362#M28981</guid>
      <dc:creator>MWeiler</dc:creator>
      <dc:date>2019-10-02T13:23:50Z</dc:date>
    </item>
    <item>
      <title>Re: Proc Surveyselect: If variable=1 always include this row in sample</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Proc-Surveyselect-If-variable-1-always-include-this-row-in/m-p/593363#M28982</link>
      <description>&lt;P&gt;Sir&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/32733"&gt;@FreelanceReinh&lt;/a&gt;&amp;nbsp; &amp;nbsp;When and if you have a moment, Can you help me/other folks understand whether that works on equal probabilities or subject proportions based on sample size/rates. Thank you in advance&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;PS You are likely to ask me did i read the doc, yes but my dumb brain didn't understand it. So asking&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 02 Oct 2019 13:30:33 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Proc-Surveyselect-If-variable-1-always-include-this-row-in/m-p/593363#M28982</guid>
      <dc:creator>novinosrin</dc:creator>
      <dc:date>2019-10-02T13:30:33Z</dc:date>
    </item>
    <item>
      <title>Re: Proc Surveyselect: If variable=1 always include this row in sample</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Proc-Surveyselect-If-variable-1-always-include-this-row-in/m-p/593396#M28983</link>
      <description>&lt;P&gt;One way is to add that variable to the strata and make the selection for those strata 100%.&lt;/P&gt;
&lt;P&gt;So your samprate information would look like&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;BIN&amp;nbsp;&amp;nbsp;&amp;nbsp; flag&amp;nbsp;&amp;nbsp; rate&lt;/P&gt;
&lt;P&gt;1&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 0&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; .0512&lt;/P&gt;
&lt;P&gt;1&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 1&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 1&lt;/P&gt;
&lt;P&gt;2&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 0&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; .2687&lt;/P&gt;
&lt;P&gt;2&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 1&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 1&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I have ho idea how complex that may be since you did not provide any actual example code.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 02 Oct 2019 14:41:03 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Proc-Surveyselect-If-variable-1-always-include-this-row-in/m-p/593396#M28983</guid>
      <dc:creator>ballardw</dc:creator>
      <dc:date>2019-10-02T14:41:03Z</dc:date>
    </item>
    <item>
      <title>Re: Proc Surveyselect: If variable=1 always include this row in sample</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Proc-Surveyselect-If-variable-1-always-include-this-row-in/m-p/593400#M28984</link>
      <description>&lt;P&gt;Glad to see that my suggestion helped you. Indeed, I wouldn't be surprised if more frequent users of PROC SURVEYSELECT came up with some built-in functionality that could serve your purpose (which seems quite common to me), be it CERTSIZE as suggested by data_null__ or something else. There are so many options that I've never used but I'd like to learn more about.&lt;/P&gt;</description>
      <pubDate>Wed, 02 Oct 2019 14:44:49 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Proc-Surveyselect-If-variable-1-always-include-this-row-in/m-p/593400#M28984</guid>
      <dc:creator>FreelanceReinh</dc:creator>
      <dc:date>2019-10-02T14:44:49Z</dc:date>
    </item>
    <item>
      <title>Re: Proc Surveyselect: If variable=1 always include this row in sample</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Proc-Surveyselect-If-variable-1-always-include-this-row-in/m-p/593423#M28985</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/138205"&gt;@novinosrin&lt;/a&gt;: My understanding was that the OP had already determined sample sizes based on the strata (=bin) proportions in the &lt;FONT face="courier new,courier"&gt;application_data&lt;/FONT&gt;. The remaining task was just to &lt;EM&gt;prioritize&lt;/EM&gt; the flagged observations when sampling from the &lt;FONT face="courier new,courier"&gt;reference_data&lt;/FONT&gt;, but without exceeding the planned sample sizes per stratum.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;For example, if the planned sample size for a particular stratum in the&amp;nbsp;&lt;FONT face="courier new,courier"&gt;reference_data&lt;/FONT&gt; was 5 out of 20 (i.e.,&amp;nbsp;each item had a selection probability of 0.25 in the case of simple random sampling) and there were 2 items with Flag=1 in that stratum, the sample should be composed of: the 2 flagged items (selection probability 1) plus 3 out of the 18 items with Flag=0 (thus with a reduced selection probability of 1/6=0.1666...). If, however, 8 items in that stratum had Flag=1, only a random sample of 5 out of these 8 would be drawn (selection probability 0.625 -- always assuming simple random sampling) and none of the remaining 12 items with Flag=0 would be selected.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Does this answer your question?&lt;/P&gt;</description>
      <pubDate>Wed, 02 Oct 2019 15:16:49 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Proc-Surveyselect-If-variable-1-always-include-this-row-in/m-p/593423#M28985</guid>
      <dc:creator>FreelanceReinh</dc:creator>
      <dc:date>2019-10-02T15:16:49Z</dc:date>
    </item>
    <item>
      <title>Re: Proc Surveyselect: If variable=1 always include this row in sample</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Proc-Surveyselect-If-variable-1-always-include-this-row-in/m-p/593426#M28986</link>
      <description>&lt;P&gt;Thank you Sir&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/32733"&gt;@FreelanceReinh&lt;/a&gt;&amp;nbsp; &amp;nbsp;that's exactly what I was after. Very neat! Appreciate your time!&lt;/P&gt;</description>
      <pubDate>Wed, 02 Oct 2019 15:19:00 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Proc-Surveyselect-If-variable-1-always-include-this-row-in/m-p/593426#M28986</guid>
      <dc:creator>novinosrin</dc:creator>
      <dc:date>2019-10-02T15:19:00Z</dc:date>
    </item>
  </channel>
</rss>

