<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Randomly selecting 5 case per cluster in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Randomly-selecting-5-case-per-cluster/m-p/473827#M121630</link>
    <description>&lt;P&gt;Hi there,&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I am working on a case control study, for each case, I have 20 controls, but I would like to further select a smaller sample with 5 controls per case. How can I select it randomly without replacement? I have googled it and seems like proc surveyselect could be a good solution for my case, however I don't know how to specify the parameters to get what I want, anyone got any ideas?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;A sample dataset would look like this:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;data have;&lt;/P&gt;&lt;P&gt;input n id;&lt;/P&gt;&lt;P&gt;datalines&lt;/P&gt;&lt;P&gt;1 12&amp;nbsp;&lt;/P&gt;&lt;P&gt;1 13&lt;/P&gt;&lt;P&gt;1 14&lt;/P&gt;&lt;P&gt;1 15&lt;/P&gt;&lt;P&gt;1 16&lt;/P&gt;&lt;P&gt;1 17&lt;/P&gt;&lt;P&gt;1 18&lt;/P&gt;&lt;P&gt;2 35&lt;/P&gt;&lt;P&gt;2 40&lt;/P&gt;&lt;P&gt;2 56&lt;/P&gt;&lt;P&gt;2 57&lt;/P&gt;&lt;P&gt;2 58&lt;/P&gt;&lt;P&gt;2 59&lt;/P&gt;&lt;P&gt;2 60&lt;/P&gt;&lt;P&gt;;&lt;/P&gt;&lt;P&gt;run;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;where n refers to case id, and id refers to control id, I would like to select&amp;nbsp;5 controls per case randomly.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks!&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Wed, 27 Jun 2018 17:07:05 GMT</pubDate>
    <dc:creator>ncy</dc:creator>
    <dc:date>2018-06-27T17:07:05Z</dc:date>
    <item>
      <title>Randomly selecting 5 case per cluster</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Randomly-selecting-5-case-per-cluster/m-p/473827#M121630</link>
      <description>&lt;P&gt;Hi there,&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I am working on a case control study, for each case, I have 20 controls, but I would like to further select a smaller sample with 5 controls per case. How can I select it randomly without replacement? I have googled it and seems like proc surveyselect could be a good solution for my case, however I don't know how to specify the parameters to get what I want, anyone got any ideas?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;A sample dataset would look like this:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;data have;&lt;/P&gt;&lt;P&gt;input n id;&lt;/P&gt;&lt;P&gt;datalines&lt;/P&gt;&lt;P&gt;1 12&amp;nbsp;&lt;/P&gt;&lt;P&gt;1 13&lt;/P&gt;&lt;P&gt;1 14&lt;/P&gt;&lt;P&gt;1 15&lt;/P&gt;&lt;P&gt;1 16&lt;/P&gt;&lt;P&gt;1 17&lt;/P&gt;&lt;P&gt;1 18&lt;/P&gt;&lt;P&gt;2 35&lt;/P&gt;&lt;P&gt;2 40&lt;/P&gt;&lt;P&gt;2 56&lt;/P&gt;&lt;P&gt;2 57&lt;/P&gt;&lt;P&gt;2 58&lt;/P&gt;&lt;P&gt;2 59&lt;/P&gt;&lt;P&gt;2 60&lt;/P&gt;&lt;P&gt;;&lt;/P&gt;&lt;P&gt;run;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;where n refers to case id, and id refers to control id, I would like to select&amp;nbsp;5 controls per case randomly.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks!&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 27 Jun 2018 17:07:05 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Randomly-selecting-5-case-per-cluster/m-p/473827#M121630</guid>
      <dc:creator>ncy</dc:creator>
      <dc:date>2018-06-27T17:07:05Z</dc:date>
    </item>
    <item>
      <title>Re: Randomly selecting 5 case per cluster</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Randomly-selecting-5-case-per-cluster/m-p/473833#M121636</link>
      <description>&lt;PRE&gt;proc surveyselect data=have
   out=selected sampsize=5 outall;
   strata n;
run;&lt;/PRE&gt;
&lt;P&gt;The rule when you say something like nn per value of a variable is that the variable is a STRATA for surveyselect. The input set has to be sorted by the strata variable. The sampsize option has how many records per strata are desired. If you have different sizes per strata that can be accomplished by listing the sizes &lt;STRONG&gt;in order of the strata variable values&lt;/STRONG&gt; such as sampsize(5 6 4) would say take 5 from the first strata, 6 from the second and 4 from the last.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I used the OUTALL option to create a set with all of your starting records and an added variable named Selected which has a value of 1 for the selected records. Notice that SAS also adds a selection probability and a sampling weight. Feel free to drop them if aren't going to use the weights for anything later.&lt;/P&gt;</description>
      <pubDate>Wed, 27 Jun 2018 17:20:12 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Randomly-selecting-5-case-per-cluster/m-p/473833#M121636</guid>
      <dc:creator>ballardw</dc:creator>
      <dc:date>2018-06-27T17:20:12Z</dc:date>
    </item>
  </channel>
</rss>

