<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Random sampling with parameters in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Random-sampling-with-parameters/m-p/763348#M241750</link>
    <description>This also works. Thank you!!</description>
    <pubDate>Mon, 23 Aug 2021 20:02:55 GMT</pubDate>
    <dc:creator>novicenoice</dc:creator>
    <dc:date>2021-08-23T20:02:55Z</dc:date>
    <item>
      <title>Random sampling with parameters</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Random-sampling-with-parameters/m-p/762629#M241481</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I have a case data set with 70 hospital and 5 disease types. Some hospitals have less than 5 cases for the year, some hospitals have many cases. Some hospitals do not have all 5 disease types, some hospitals have all disease types.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I am trying to write code to randomly sample 5 cases per hospital. However, there are parameters to how the cases can be chosen. If a hospital has all 5 disease types, I want to randomly select one disease type per hospital. If there are more than 5 cases and only 4 disease types, I want to capture the 4 different cases, and the fifth case must be chosen but disease type doesn't matter. If there are only 3 cases, I want to capture all 3 (randomly sampling doesn't really matter here anymore). And so on and so forth. Random sampling is only for concordance purposes so I don't truly need a randomized sample.&amp;nbsp;&lt;EM&gt;However&lt;/EM&gt;, there are hospitals with more than 100 cases, with more than 15 of each disease type, and I'd like those selections to be random (not the first case SAS reads).&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Ultimately, case number per hospital determines if I can even sample 5 (if less I'll take all), then I want to select at least one of each disease type per hospital, and then I want to sample 5 from each hospital.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;With 70 hospitals and 5 disease types, if there were at least 5 cases per hospital, I'd have an end sample size of 350. However, that is not always the situation because some hospitals may have less than 5 cases.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Example below:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;hosp_id&amp;nbsp; &amp;nbsp; &amp;nbsp;disease_type&amp;nbsp;&lt;/P&gt;&lt;P&gt;A&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 1&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&lt;/P&gt;&lt;P&gt;A&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 1&amp;nbsp;&lt;/P&gt;&lt;P&gt;A &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;2&amp;nbsp;&lt;/P&gt;&lt;P&gt;A &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;5&amp;nbsp;&lt;/P&gt;&lt;P&gt;B&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 1&lt;/P&gt;&lt;P&gt;B &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;2&lt;/P&gt;&lt;P&gt;B &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;3&lt;/P&gt;&lt;P&gt;B &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;3&lt;/P&gt;&lt;P&gt;B &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;4&lt;/P&gt;&lt;P&gt;B &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;5&lt;/P&gt;&lt;P&gt;...&lt;/P&gt;&lt;P&gt;C&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 1&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&lt;/P&gt;&lt;P&gt;C &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;1&lt;/P&gt;&lt;P&gt;C &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;1&lt;/P&gt;&lt;P&gt;C &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;2&lt;/P&gt;&lt;P&gt;C&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 2&lt;/P&gt;&lt;P&gt;C &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;2&lt;/P&gt;&lt;P&gt;C&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 4&lt;/P&gt;&lt;P&gt;C&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 5&lt;/P&gt;&lt;P&gt;...&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Any help greatly appreciated! &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 19 Aug 2021 18:20:14 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Random-sampling-with-parameters/m-p/762629#M241481</guid>
      <dc:creator>novicenoice</dc:creator>
      <dc:date>2021-08-19T18:20:14Z</dc:date>
    </item>
    <item>
      <title>Re: Random sampling with parameters</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Random-sampling-with-parameters/m-p/762645#M241484</link>
      <description>With that many custom rules I think you have to go through and implement a manual selection. I also wouldn't call it random &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;</description>
      <pubDate>Thu, 19 Aug 2021 18:49:55 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Random-sampling-with-parameters/m-p/762645#M241484</guid>
      <dc:creator>Reeza</dc:creator>
      <dc:date>2021-08-19T18:49:55Z</dc:date>
    </item>
    <item>
      <title>Re: Random sampling with parameters</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Random-sampling-with-parameters/m-p/762646#M241485</link>
      <description>I definitely can see manual selection happening with hospitals with smaller case counts.&lt;BR /&gt;The last thing I was trying was to split the dataset and find hospitals with either 5 or less cases or less than 5 disease types and selecting those manually.&lt;BR /&gt;Then I would randomly select from hospitals with more cases and all 5 disease types. If not random selection (with proc surveyselect), SAS selects the first case it sees.</description>
      <pubDate>Thu, 19 Aug 2021 18:55:51 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Random-sampling-with-parameters/m-p/762646#M241485</guid>
      <dc:creator>novicenoice</dc:creator>
      <dc:date>2021-08-19T18:55:51Z</dc:date>
    </item>
    <item>
      <title>Re: Random sampling with parameters</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Random-sampling-with-parameters/m-p/762661#M241490</link>
      <description>Actually the SELECTALL option in PROC SURVEY SELECT handles that scenario fine. &lt;BR /&gt;&lt;BR /&gt;This is the part that's complicated:&lt;BR /&gt;I am trying to write code to randomly sample 5 cases per hospital. However, there are parameters to how the cases can be chosen. If a hospital has all 5 disease types, I want to randomly select one disease type per hospital. If there are more than 5 cases and only 4 disease types, I want to capture the 4 different cases, and the fifth case must be chosen but disease type doesn't matter. If there are only 3 cases, I want to capture all 3 (randomly sampling doesn't really matter here anymore).</description>
      <pubDate>Thu, 19 Aug 2021 19:28:19 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Random-sampling-with-parameters/m-p/762661#M241490</guid>
      <dc:creator>Reeza</dc:creator>
      <dc:date>2021-08-19T19:28:19Z</dc:date>
    </item>
    <item>
      <title>Re: Random sampling with parameters</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Random-sampling-with-parameters/m-p/762914#M241578</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/393968"&gt;@novicenoice&lt;/a&gt;,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I'm not sure if PROC SURVEYSELECT would be particularly useful for your requirements, so here's a suggestion using the traditional technique of sorting by random numbers, balancing the disease types within each hospital (as far as possible). For example, if a hospital had ten patients -- two with disease type 1 and the rest with disease type 4 -- the first two would be selected with certainty, plus a random sample of three out of the remaining eight. If &lt;EM&gt;three&lt;/EM&gt; (rather than two) of the ten patients had disease type 1, it would be decided randomly (with probability 1/2) whether all three or only a random sample of two were included in the final sample of five.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;/* Create sample data for demonstration */

data have;
call streaminit(27182818);
do hosp_id=1 to 70;
  do _n_=1 to ceil(1/rand('expo',.2));
    disease_type=rand('table',.2,.15,.25,.3,.1);
    usubjid+1;
    output;
  end;
end;
run;

proc sort data=have;
by hosp_id disease_type;
run;

/* Create a random sort order within the (HOSP_ID, DISEASE_TYPE) strata */

data temp;
call streaminit('MT64',27182818);
set have;
_r1=rand('uniform');
run;

proc sort data=temp;
by hosp_id disease_type _r1;
run;

/* Number the observations sequentially within the strata */
/* in order to define selection priorities                */

data temp;
call streaminit('MT64',3141592);
set temp(drop=_r1);
by hosp_id disease_type;
if first.disease_type then _prio=1;
else _prio+1;
_r2=rand('uniform');
run;

/* Create a random sort order within the (HOSP_ID, _PRIO) groups */

proc sort data=temp;
by hosp_id _prio _r2;
run;

/* Select five observations per HOSP_ID if possible (otherwise all) */

data want(drop=_:);
set temp;
by hosp_id;
if first.hosp_id then _c=1;
else _c+1;
if _c&amp;lt;=5;
run;

proc sort data=want;
by hosp_id disease_type usubjid;
run;&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Fri, 20 Aug 2021 18:32:35 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Random-sampling-with-parameters/m-p/762914#M241578</guid>
      <dc:creator>FreelanceReinh</dc:creator>
      <dc:date>2021-08-20T18:32:35Z</dc:date>
    </item>
    <item>
      <title>Re: Random sampling with parameters</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Random-sampling-with-parameters/m-p/762920#M241582</link>
      <description>&lt;P&gt;Random sampling with at least one per disease_id and a total of 5 per hosp_id :&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data have;
input hosp_id $ disease_type; 
datalines;
A                1         
A                1 
A                2 
A                5 
B                1
B                2
B                3
B                3
B                4
B                5
C                1         
C                1
C                1
C                2
C                2
C                2
C                4
C                5
;

data haveRnd;
call streaminit(85865);
set have;
rnd = rand("uniform");
run;

proc sort data=haveRnd; by hosp_id disease_type rnd; run;

data haveFirst;
set haveRnd; by hosp_id disease_type;
first = first.disease_type;
run;

proc sort data=haveFirst; by hosp_id descending first rnd; run;

data want;
do order = 1 by 1 until (last.hosp_id);
    set haveFirst; by hosp_id;
    if order &amp;lt;= 5 then output;
    end;
drop rnd order first;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="PGStats_0-1629486155868.png" style="width: 400px;"&gt;&lt;img src="https://communities.sas.com/t5/image/serverpage/image-id/62784i9679636723790D2F/image-size/medium?v=v2&amp;amp;px=400" role="button" title="PGStats_0-1629486155868.png" alt="PGStats_0-1629486155868.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 20 Aug 2021 19:03:19 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Random-sampling-with-parameters/m-p/762920#M241582</guid>
      <dc:creator>PGStats</dc:creator>
      <dc:date>2021-08-20T19:03:19Z</dc:date>
    </item>
    <item>
      <title>Re: Random sampling with parameters</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Random-sampling-with-parameters/m-p/763347#M241749</link>
      <description>Thank you so much!! This works!</description>
      <pubDate>Mon, 23 Aug 2021 20:01:55 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Random-sampling-with-parameters/m-p/763347#M241749</guid>
      <dc:creator>novicenoice</dc:creator>
      <dc:date>2021-08-23T20:01:55Z</dc:date>
    </item>
    <item>
      <title>Re: Random sampling with parameters</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Random-sampling-with-parameters/m-p/763348#M241750</link>
      <description>This also works. Thank you!!</description>
      <pubDate>Mon, 23 Aug 2021 20:02:55 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Random-sampling-with-parameters/m-p/763348#M241750</guid>
      <dc:creator>novicenoice</dc:creator>
      <dc:date>2021-08-23T20:02:55Z</dc:date>
    </item>
  </channel>
</rss>

