<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Proc Survey Select - Stratified Random Sampling in SAS Procedures</title>
    <link>https://communities.sas.com/t5/SAS-Procedures/Proc-Survey-Select-Stratified-Random-Sampling/m-p/437680#M69037</link>
    <description>&lt;P&gt;From the documentation:&lt;/P&gt;
&lt;P&gt;PROC SURVEYSELECT treats missing values of &lt;A href="http://127.0.0.1:52151/help/statug.hlp/statug_surveyselect_syntax07.htm" target="_blank"&gt;STRATA&lt;/A&gt; and &lt;A href="http://127.0.0.1:52151/help/statug.hlp/statug_surveyselect_syntax05.htm" target="_blank"&gt;SAMPLINGUNIT&lt;/A&gt; variables like any other STRATA or SAMPLINGUNIT .&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Which means that your missing D strata has one more level than values which is likely causing issues with the A B C combinations&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Consider a strata that only has 7 members and you request a samprate of 80. How many would you expect in the output? (Hint: 7* .8= 5.6 rounds to 6) (or 80 percent of 23 or practically anything you'll have rounding issues.).&lt;/P&gt;
&lt;P&gt;You may be having multiple round up issues due to the sizes of your strata.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Run this code:&lt;/P&gt;
&lt;P&gt;&lt;FONT color="#0000ff" face="SAS Monospace" size="2"&gt;proc&lt;/FONT&gt; &lt;FONT color="#0000ff" face="SAS Monospace" size="2"&gt;freq&lt;/FONT&gt; &lt;FONT color="#0000ff" face="SAS Monospace" size="2"&gt;data&lt;/FONT&gt;&lt;FONT face="SAS Monospace" size="2"&gt;=overall_new;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT color="#0000ff" face="SAS Monospace" size="2"&gt;tables&lt;/FONT&gt;&lt;FONT face="SAS Monospace" size="2"&gt; a*b*c*d/&lt;/FONT&gt;&lt;FONT color="#0000ff" face="SAS Monospace" size="2"&gt;list&lt;/FONT&gt; &lt;FONT color="#0000ff" face="SAS Monospace" size="2"&gt;missing&lt;/FONT&gt;&lt;FONT face="SAS Monospace" size="2"&gt;;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT color="#0000ff" face="SAS Monospace" size="2"&gt;run&lt;/FONT&gt;&lt;FONT face="SAS Monospace" size="2"&gt;;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="SAS Monospace" size="2"&gt;and see how many records per combination of the strata you have. &lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="SAS Monospace" size="2"&gt;You don't mention how many levels any of your strata have but if there are more than 5 each and are roughly evenly distributed you don't have many records per combination of strata variables, about 25 per combination. With more levels the numbers per strata combination can go way down increasing the issue of rounding to 80 percent per. &lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="SAS Monospace" size="2"&gt;You might be better served by summarizing the input data by the strata variables, getting an explicit count of available (proc means or summary don't forget missing option), using a data step to do your rounding per combination and use that as a SAMPSIZE data set.&lt;/FONT&gt;&lt;/P&gt;</description>
    <pubDate>Thu, 15 Feb 2018 16:29:52 GMT</pubDate>
    <dc:creator>ballardw</dc:creator>
    <dc:date>2018-02-15T16:29:52Z</dc:date>
    <item>
      <title>Proc Survey Select - Stratified Random Sampling</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/Proc-Survey-Select-Stratified-Random-Sampling/m-p/437481#M69025</link>
      <description>&lt;P&gt;In Using Proc SurveySelect (from SAS 9.4)&amp;nbsp; for sampling Train and Validation data-sets in an 80-20 split, I find that the number of records does not exactly correspond to 80% of the original for the Train set (or exactly 20% in the Validation set). Is this normal ?&amp;nbsp;One of my strata variables contains missing values (about 2% of variable D is missing).&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;The original data-set, &lt;EM&gt;overall_new,&lt;/EM&gt; contains 15,573 rows and 80% of it is 12458 rows.&lt;/P&gt;
&lt;P&gt;In the resulting data-sets&amp;nbsp;, the Train set ,&amp;nbsp;&lt;EM&gt;intime_TR,&lt;/EM&gt;&amp;nbsp;has 12475 rows, which is more than 12,458. Any ideas why this might be so?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Thanks!&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;BR /&gt;&amp;nbsp; &amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;
proc surveyselect data=overall_new
                  out=sorted_intime
                       noprint
                       seed=1234
                       method=srs
                       samprate=80
                  outall;
strata A B C D;
run;&lt;BR /&gt;&lt;BR /&gt;/*12475 rows*/&lt;BR /&gt;&lt;BR /&gt;data intime_TR;&lt;BR /&gt;&lt;BR /&gt;set sorted_intime;&lt;BR /&gt;&lt;BR /&gt;if selected =1;&lt;BR /&gt;&lt;BR /&gt;run;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;/*3098 rows*/&lt;BR /&gt;&lt;BR /&gt;data intime_VAL;&lt;BR /&gt;&lt;BR /&gt;set sorted_intime;&lt;BR /&gt;&lt;BR /&gt;if selected =0;&lt;BR /&gt;&lt;BR /&gt;run;&lt;BR /&gt;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 15 Feb 2018 10:10:08 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/Proc-Survey-Select-Stratified-Random-Sampling/m-p/437481#M69025</guid>
      <dc:creator>nstdt</dc:creator>
      <dc:date>2018-02-15T10:10:08Z</dc:date>
    </item>
    <item>
      <title>Re: Proc Survey Select - Stratified Random Sampling</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/Proc-Survey-Select-Stratified-Random-Sampling/m-p/437680#M69037</link>
      <description>&lt;P&gt;From the documentation:&lt;/P&gt;
&lt;P&gt;PROC SURVEYSELECT treats missing values of &lt;A href="http://127.0.0.1:52151/help/statug.hlp/statug_surveyselect_syntax07.htm" target="_blank"&gt;STRATA&lt;/A&gt; and &lt;A href="http://127.0.0.1:52151/help/statug.hlp/statug_surveyselect_syntax05.htm" target="_blank"&gt;SAMPLINGUNIT&lt;/A&gt; variables like any other STRATA or SAMPLINGUNIT .&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Which means that your missing D strata has one more level than values which is likely causing issues with the A B C combinations&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Consider a strata that only has 7 members and you request a samprate of 80. How many would you expect in the output? (Hint: 7* .8= 5.6 rounds to 6) (or 80 percent of 23 or practically anything you'll have rounding issues.).&lt;/P&gt;
&lt;P&gt;You may be having multiple round up issues due to the sizes of your strata.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Run this code:&lt;/P&gt;
&lt;P&gt;&lt;FONT color="#0000ff" face="SAS Monospace" size="2"&gt;proc&lt;/FONT&gt; &lt;FONT color="#0000ff" face="SAS Monospace" size="2"&gt;freq&lt;/FONT&gt; &lt;FONT color="#0000ff" face="SAS Monospace" size="2"&gt;data&lt;/FONT&gt;&lt;FONT face="SAS Monospace" size="2"&gt;=overall_new;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT color="#0000ff" face="SAS Monospace" size="2"&gt;tables&lt;/FONT&gt;&lt;FONT face="SAS Monospace" size="2"&gt; a*b*c*d/&lt;/FONT&gt;&lt;FONT color="#0000ff" face="SAS Monospace" size="2"&gt;list&lt;/FONT&gt; &lt;FONT color="#0000ff" face="SAS Monospace" size="2"&gt;missing&lt;/FONT&gt;&lt;FONT face="SAS Monospace" size="2"&gt;;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT color="#0000ff" face="SAS Monospace" size="2"&gt;run&lt;/FONT&gt;&lt;FONT face="SAS Monospace" size="2"&gt;;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="SAS Monospace" size="2"&gt;and see how many records per combination of the strata you have. &lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="SAS Monospace" size="2"&gt;You don't mention how many levels any of your strata have but if there are more than 5 each and are roughly evenly distributed you don't have many records per combination of strata variables, about 25 per combination. With more levels the numbers per strata combination can go way down increasing the issue of rounding to 80 percent per. &lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="SAS Monospace" size="2"&gt;You might be better served by summarizing the input data by the strata variables, getting an explicit count of available (proc means or summary don't forget missing option), using a data step to do your rounding per combination and use that as a SAMPSIZE data set.&lt;/FONT&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 15 Feb 2018 16:29:52 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/Proc-Survey-Select-Stratified-Random-Sampling/m-p/437680#M69037</guid>
      <dc:creator>ballardw</dc:creator>
      <dc:date>2018-02-15T16:29:52Z</dc:date>
    </item>
  </channel>
</rss>

