<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Taking a random sample using SURVEYSELECT that contains at least one record from each group in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Taking-a-random-sample-using-SURVEYSELECT-that-contains-at-least/m-p/915033#M360587</link>
    <description>&lt;P&gt;Great&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/32733"&gt;@FreelanceReinh&lt;/a&gt;&amp;nbsp;! Thank you and thanks for the quick response.&lt;/P&gt;&lt;P&gt;Gary&lt;/P&gt;</description>
    <pubDate>Thu, 08 Feb 2024 13:13:21 GMT</pubDate>
    <dc:creator>ghartge</dc:creator>
    <dc:date>2024-02-08T13:13:21Z</dc:date>
    <item>
      <title>Taking a random sample using SURVEYSELECT that contains at least one record from each group</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Taking-a-random-sample-using-SURVEYSELECT-that-contains-at-least/m-p/914922#M360549</link>
      <description>&lt;P&gt;Greetings,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;OK, so I have a dataset where each row has&lt;/P&gt;&lt;P class="lia-align-justify"&gt;-course&lt;/P&gt;&lt;P class="lia-align-justify"&gt;-faculty&lt;/P&gt;&lt;P class="lia-align-justify"&gt;-student&lt;/P&gt;&lt;P class="lia-align-justify"&gt;-Group_Name&lt;/P&gt;&lt;P&gt;There are five different groups for the Group_Name value.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Is it possible to use SURVEYSELECT to take a random sample where at least one record from each group is produced and keep my results to an overall N?&lt;/P&gt;&lt;P&gt;I have used PROC SURVEYSELECT to produce data where each group in represented using STRATA Group_Name;, but each group is also equal to my N. In other words, five groups of 31 in my output data (155 records) instead of a total of 31 records with each of the five groups represented at least once.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;PROC SURVEYSELECT DATA = Data_In OUT = Data_Out&lt;BR /&gt;n=31&lt;BR /&gt;seed = 12345&lt;BR /&gt;method = srs;&lt;BR /&gt;STRATA Group_Name;&lt;BR /&gt;RUN ;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;To restate my question, I would like to produce only 31 records in my Data_Out dataset, but have each "Group" represented at least once.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Commenting out the line "STRATA Group_Name;" line produces 31 records equaling my N value, and has up to now produced at least one record from each group, but is this how PROC SURVEYSELECT functions or have I simply been lucky?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Gary&lt;/P&gt;</description>
      <pubDate>Wed, 07 Feb 2024 19:33:37 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Taking-a-random-sample-using-SURVEYSELECT-that-contains-at-least/m-p/914922#M360549</guid>
      <dc:creator>ghartge</dc:creator>
      <dc:date>2024-02-07T19:33:37Z</dc:date>
    </item>
    <item>
      <title>Re: Taking a random sample using SURVEYSELECT that contains at least one record from each group</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Taking-a-random-sample-using-SURVEYSELECT-that-contains-at-least/m-p/915018#M360581</link>
      <description>&lt;P&gt;Hello&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/99545"&gt;@ghartge&lt;/a&gt;,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I think adding the ALLOC=PROP option to the STRATA statement should solve the problem:&lt;/P&gt;
&lt;PRE&gt;strata Group_Name&lt;FONT color="#3366FF"&gt;&lt;STRONG&gt; / alloc=prop&lt;/STRONG&gt;&lt;/FONT&gt;;&lt;/PRE&gt;
&lt;P&gt;The &lt;A href="https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/statug/statug_surveyselect_syntax07.htm#statug.surveyselect.allocmin" target="_blank" rel="noopener"&gt;documentation&lt;/A&gt;&amp;nbsp;of the related ALLOCMIN= option (by which you could request at least &lt;EM&gt;n&lt;/EM&gt; observations per stratum) says: "&lt;SPAN&gt;By default, PROC SURVEYSELECT allocates at least one sampling unit to each stratum." At the same time, proportional allocation comes close to what a simple random sample would yield on average.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;&lt;U&gt;Edit:&lt;/U&gt; If you want to allow variability in the frequency distribution of variable &lt;FONT face="courier new,courier"&gt;Group_Name&lt;/FONT&gt; in the result, you can perform the selection in two steps:&lt;/SPAN&gt;&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;&lt;SPAN&gt;One randomly selected observation from each group.&lt;/SPAN&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;SPAN&gt;A simple random sample of 31−5=26 observations from the &lt;EM&gt;remaining&lt;/EM&gt; observations, without stratification.&lt;/SPAN&gt;&lt;/LI&gt;
&lt;/OL&gt;
&lt;P&gt;&lt;SPAN&gt;Code:&lt;/SPAN&gt;&lt;/P&gt;
&lt;PRE&gt;proc surveyselect data=data_in
method=srs n=&lt;STRONG&gt;1&lt;/STRONG&gt; &lt;STRONG&gt;outall&lt;/STRONG&gt;
seed=12345 out=&lt;STRONG&gt;step1&lt;/STRONG&gt;;
strata Group_Name;
run;

proc surveyselect data=step1(&lt;STRONG&gt;where=(not selected)&lt;/STRONG&gt;)
method=srs n=&lt;STRONG&gt;26&lt;/STRONG&gt;
seed=2718 out=&lt;STRONG&gt;step2&lt;/STRONG&gt;;
run;

data want;
set &lt;STRONG&gt;step1&lt;/STRONG&gt;(&lt;STRONG&gt;where=(selected)&lt;/STRONG&gt;)
    &lt;STRONG&gt;step2&lt;/STRONG&gt;;
by Group_Name;
drop Selected SelectionProb SamplingWeight;
run;&lt;/PRE&gt;</description>
      <pubDate>Thu, 08 Feb 2024 13:06:51 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Taking-a-random-sample-using-SURVEYSELECT-that-contains-at-least/m-p/915018#M360581</guid>
      <dc:creator>FreelanceReinh</dc:creator>
      <dc:date>2024-02-08T13:06:51Z</dc:date>
    </item>
    <item>
      <title>Re: Taking a random sample using SURVEYSELECT that contains at least one record from each group</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Taking-a-random-sample-using-SURVEYSELECT-that-contains-at-least/m-p/915033#M360587</link>
      <description>&lt;P&gt;Great&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/32733"&gt;@FreelanceReinh&lt;/a&gt;&amp;nbsp;! Thank you and thanks for the quick response.&lt;/P&gt;&lt;P&gt;Gary&lt;/P&gt;</description>
      <pubDate>Thu, 08 Feb 2024 13:13:21 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Taking-a-random-sample-using-SURVEYSELECT-that-contains-at-least/m-p/915033#M360587</guid>
      <dc:creator>ghartge</dc:creator>
      <dc:date>2024-02-08T13:13:21Z</dc:date>
    </item>
  </channel>
</rss>

