<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Random sample patient record with predefined distribution rate from each year in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Random-sample-patient-record-with-predefined-distribution-rate/m-p/631785#M187215</link>
    <description>&lt;P&gt;"&lt;SPAN&gt;I want to get a random sample of patients coming from each year (2015 to 2018) at a 38, 22,19,21% respectively &lt;STRONG&gt;without repeating the same patient ID&lt;/STRONG&gt;.&lt;/SPAN&gt;"&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Does that mean you only consider a specific patient id for your sample in the first year it appears in your source data or does this mean as long as you haven't selected a specific patient id in another year, it's still up for grabs for your sample and you just don't want repeated ID's in your sample.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;And depending on your answer:&lt;/P&gt;
&lt;P&gt;What does 21% for your last year mean? 21% based on the total rows in your source, or 21% of the source rows for this specific year (and "excluded" Id's counted or not?), or 21% of rows in the sample to be from the last year?&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;How did you come up with these percentages per year in first place? Are they based on your current source data and you just want to end-up with the same number of patients per year in your sample?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Here an attempt to create sample HAVE data for your case. Can you please verify if this data is suitable.&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;/* create sample Have data */
data _null_;
  length year id 8;
  dcl hash h1(multidata:'n');
  h1.defineKey('year','id');
  h1.defineData('year','id');
  h1.defineDone();
  call streaminit(2);
  do year=2016 to 2019;
    _stop=rand('integer',1000,3000);
    do _j=1 to _stop;
      id=rand('integer',1,10000);
      _rc=h1.ref();
    end;
  end;
  h1.output(dataset:'have');
  stop;
run; &lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Fri, 13 Mar 2020 05:22:01 GMT</pubDate>
    <dc:creator>Patrick</dc:creator>
    <dc:date>2020-03-13T05:22:01Z</dc:date>
    <item>
      <title>Random sample patient record with predefined distribution rate from each year</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Random-sample-patient-record-with-predefined-distribution-rate/m-p/631677#M187178</link>
      <description>&lt;P&gt;Hey,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Have:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Patient membership records spanning from 2015 to 2018, not every patient would have all years of membership enrollment.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;Want:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I want to get a random sample of patients coming from each year (2015 to 2018) at a 38, 22,19,21% respectively without repeating the same patient ID.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Is it possible to do all of these in one proc?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks&lt;/P&gt;</description>
      <pubDate>Thu, 12 Mar 2020 19:09:24 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Random-sample-patient-record-with-predefined-distribution-rate/m-p/631677#M187178</guid>
      <dc:creator>Sujithpeta</dc:creator>
      <dc:date>2020-03-12T19:09:24Z</dc:date>
    </item>
    <item>
      <title>Re: Random sample patient record with predefined distribution rate from each year</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Random-sample-patient-record-with-predefined-distribution-rate/m-p/631717#M187189</link>
      <description>&lt;P&gt;&lt;EM&gt;&amp;gt;Is it possible to do all of these in one proc?&lt;/EM&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;I don't think so.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;EM&gt;&amp;gt;not every patient would have all years of membership enrollment.&lt;/EM&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;No preference in terms of percentage of various patient tenures?&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 12 Mar 2020 21:59:01 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Random-sample-patient-record-with-predefined-distribution-rate/m-p/631717#M187189</guid>
      <dc:creator>ChrisNZ</dc:creator>
      <dc:date>2020-03-12T21:59:01Z</dc:date>
    </item>
    <item>
      <title>Re: Random sample patient record with predefined distribution rate from each year</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Random-sample-patient-record-with-predefined-distribution-rate/m-p/631775#M187213</link>
      <description>&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;You have not provided a sample data set, so my suggestion is totally untested.&amp;nbsp; I presume you have a data set with ID and YEAR variables (or date variable from which YEAR can be extracted).&amp;nbsp; Each ID may have any number of records (including zero records) in each year.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;You want a random sample (at different sampling rates) for each of 4 years.&amp;nbsp; And if an ID is drawn for one year, it is not eligible to be drawn from another year.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;It is conceivable that this is not possible.&amp;nbsp; Consider exactly 100 patients, each with one record in each year.&amp;nbsp; Then your samples of 38%, 22%, 19% and 21% means you would draw one record from each of the patients.&amp;nbsp; Now imagine that the 38% year (call it year X) is missing from the "last" id (i.e. the id is present only in the other 3 years).&amp;nbsp; The ramdom sample size of 38% of 99 is still presumably 38 obs.&amp;nbsp; If your randomization scheme, over the course of the first 99 draws, selects a complete complement for the other years, and 37 for yearX, then the 100th observation is not sampled - it is not available for yearX and it is not needed for the other years.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I.e. it is possible your data may be pathological enough to make it impossible to get the sample you want - even if all the sampling rates were identical 25%.&amp;nbsp; This is because the same ID may be present in multiple years, yet is not allowed in more than one stratum (i.e. one year).&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;This task will probably require some data step coding.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 13 Mar 2020 02:56:15 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Random-sample-patient-record-with-predefined-distribution-rate/m-p/631775#M187213</guid>
      <dc:creator>mkeintz</dc:creator>
      <dc:date>2020-03-13T02:56:15Z</dc:date>
    </item>
    <item>
      <title>Re: Random sample patient record with predefined distribution rate from each year</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Random-sample-patient-record-with-predefined-distribution-rate/m-p/631785#M187215</link>
      <description>&lt;P&gt;"&lt;SPAN&gt;I want to get a random sample of patients coming from each year (2015 to 2018) at a 38, 22,19,21% respectively &lt;STRONG&gt;without repeating the same patient ID&lt;/STRONG&gt;.&lt;/SPAN&gt;"&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Does that mean you only consider a specific patient id for your sample in the first year it appears in your source data or does this mean as long as you haven't selected a specific patient id in another year, it's still up for grabs for your sample and you just don't want repeated ID's in your sample.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;And depending on your answer:&lt;/P&gt;
&lt;P&gt;What does 21% for your last year mean? 21% based on the total rows in your source, or 21% of the source rows for this specific year (and "excluded" Id's counted or not?), or 21% of rows in the sample to be from the last year?&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;How did you come up with these percentages per year in first place? Are they based on your current source data and you just want to end-up with the same number of patients per year in your sample?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Here an attempt to create sample HAVE data for your case. Can you please verify if this data is suitable.&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;/* create sample Have data */
data _null_;
  length year id 8;
  dcl hash h1(multidata:'n');
  h1.defineKey('year','id');
  h1.defineData('year','id');
  h1.defineDone();
  call streaminit(2);
  do year=2016 to 2019;
    _stop=rand('integer',1000,3000);
    do _j=1 to _stop;
      id=rand('integer',1,10000);
      _rc=h1.ref();
    end;
  end;
  h1.output(dataset:'have');
  stop;
run; &lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 13 Mar 2020 05:22:01 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Random-sample-patient-record-with-predefined-distribution-rate/m-p/631785#M187215</guid>
      <dc:creator>Patrick</dc:creator>
      <dc:date>2020-03-13T05:22:01Z</dc:date>
    </item>
    <item>
      <title>Re: Random sample patient record with predefined distribution rate from each year</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Random-sample-patient-record-with-predefined-distribution-rate/m-p/632019#M187332</link>
      <description>&lt;P&gt;The code you shared was through error.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Here is how the data is structured:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;ID&amp;nbsp; &amp;nbsp; &amp;nbsp;Year&lt;/P&gt;&lt;P&gt;A&amp;nbsp; &amp;nbsp; &amp;nbsp; 2015&lt;/P&gt;&lt;P&gt;A&amp;nbsp; &amp;nbsp; &amp;nbsp; 2016&lt;/P&gt;&lt;P&gt;B&amp;nbsp; &amp;nbsp; &amp;nbsp; 2015&lt;/P&gt;&lt;P&gt;B&amp;nbsp; &amp;nbsp; &amp;nbsp; 2017&lt;/P&gt;&lt;P&gt;B&amp;nbsp; &amp;nbsp; &amp;nbsp; 2018&lt;/P&gt;&lt;P&gt;C&amp;nbsp; &amp;nbsp; &amp;nbsp; 2016&lt;/P&gt;&lt;P&gt;D&amp;nbsp; &amp;nbsp; &amp;nbsp; 2018&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Patient ID, not repeating in same year and across years.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;% comes from a case group whose disease index year distribution is in the mentioned rates.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Does this help?&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/12447"&gt;@Patrick&lt;/a&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 13 Mar 2020 18:42:00 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Random-sample-patient-record-with-predefined-distribution-rate/m-p/632019#M187332</guid>
      <dc:creator>Sujithpeta</dc:creator>
      <dc:date>2020-03-13T18:42:00Z</dc:date>
    </item>
    <item>
      <title>Re: Random sample patient record with predefined distribution rate from each year</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Random-sample-patient-record-with-predefined-distribution-rate/m-p/632096#M187365</link>
      <description>&lt;P&gt;The code I've shared works for me as posted. Looks like you're on a too old SAS version for something in the code.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I still don't understand where the percentages would need to be applied and you haven't explained this further/answered my questions.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sat, 14 Mar 2020 02:11:44 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Random-sample-patient-record-with-predefined-distribution-rate/m-p/632096#M187365</guid>
      <dc:creator>Patrick</dc:creator>
      <dc:date>2020-03-14T02:11:44Z</dc:date>
    </item>
  </channel>
</rss>

