<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Create random sample using the distribution of one dataset to randomly sample from another data set in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Create-random-sample-using-the-distribution-of-one-dataset-to/m-p/899015#M355352</link>
    <description>&lt;P&gt;Hi SAS Community,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I have a distribution of how often data occurs in certain regions in dataset 1. I am trying to create a stratified random sample of another dataset with the same region variable so that it has the same proportion of accounts as from each region as in dataset 1.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I have been trying to use PROC SURVEYSELECT to do this, but I am struggling to figure out how to tie the distribution from one dataset to another and the &lt;A href="https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/statug/statug_surveyselect_overview.htm" target="_self"&gt;documentation&lt;/A&gt; has only made me more confused.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;An example of what I am trying to do is:&lt;/P&gt;&lt;P&gt;Dataset 1:&lt;/P&gt;&lt;P&gt;State Count Percent_of_total&lt;/P&gt;&lt;P&gt;PA&amp;nbsp; &amp;nbsp;10&amp;nbsp; &amp;nbsp;%10&lt;/P&gt;&lt;P&gt;NY&amp;nbsp; &amp;nbsp;30&amp;nbsp; %30&lt;/P&gt;&lt;P&gt;DE&amp;nbsp; &amp;nbsp;60&amp;nbsp; &amp;nbsp;%60&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Dataset 2 N=200&amp;nbsp;&lt;/P&gt;&lt;P&gt;State Count Percent_of_total&lt;/P&gt;&lt;P&gt;PA&amp;nbsp; &amp;nbsp; &amp;nbsp; 40&amp;nbsp; &amp;nbsp; &amp;nbsp; %20&lt;/P&gt;&lt;P&gt;NY&amp;nbsp; &amp;nbsp; &amp;nbsp;50&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;%25&lt;/P&gt;&lt;P&gt;DE&amp;nbsp; &amp;nbsp; &amp;nbsp;110&amp;nbsp; &amp;nbsp; &amp;nbsp; %55&lt;/P&gt;&lt;P&gt;Lets say I want to randomly sample 100 of the 200. I would want to match the proportions from dataset 1 so that PA made up&amp;nbsp; %10 of the sample of 100, NY would be 30% of the data, and DE would be 60% of the data. That would mean that PA I would select 10/40, NY I would pick 30/50, and DE 60/110.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I hope this makes sense what I am trying to do. Happy to clarify if what I am saying doesn't make sense.&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks so much for the help!&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Tue, 17 Oct 2023 19:55:18 GMT</pubDate>
    <dc:creator>Tommy1</dc:creator>
    <dc:date>2023-10-17T19:55:18Z</dc:date>
    <item>
      <title>Create random sample using the distribution of one dataset to randomly sample from another data set</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Create-random-sample-using-the-distribution-of-one-dataset-to/m-p/899015#M355352</link>
      <description>&lt;P&gt;Hi SAS Community,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I have a distribution of how often data occurs in certain regions in dataset 1. I am trying to create a stratified random sample of another dataset with the same region variable so that it has the same proportion of accounts as from each region as in dataset 1.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I have been trying to use PROC SURVEYSELECT to do this, but I am struggling to figure out how to tie the distribution from one dataset to another and the &lt;A href="https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/statug/statug_surveyselect_overview.htm" target="_self"&gt;documentation&lt;/A&gt; has only made me more confused.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;An example of what I am trying to do is:&lt;/P&gt;&lt;P&gt;Dataset 1:&lt;/P&gt;&lt;P&gt;State Count Percent_of_total&lt;/P&gt;&lt;P&gt;PA&amp;nbsp; &amp;nbsp;10&amp;nbsp; &amp;nbsp;%10&lt;/P&gt;&lt;P&gt;NY&amp;nbsp; &amp;nbsp;30&amp;nbsp; %30&lt;/P&gt;&lt;P&gt;DE&amp;nbsp; &amp;nbsp;60&amp;nbsp; &amp;nbsp;%60&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Dataset 2 N=200&amp;nbsp;&lt;/P&gt;&lt;P&gt;State Count Percent_of_total&lt;/P&gt;&lt;P&gt;PA&amp;nbsp; &amp;nbsp; &amp;nbsp; 40&amp;nbsp; &amp;nbsp; &amp;nbsp; %20&lt;/P&gt;&lt;P&gt;NY&amp;nbsp; &amp;nbsp; &amp;nbsp;50&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;%25&lt;/P&gt;&lt;P&gt;DE&amp;nbsp; &amp;nbsp; &amp;nbsp;110&amp;nbsp; &amp;nbsp; &amp;nbsp; %55&lt;/P&gt;&lt;P&gt;Lets say I want to randomly sample 100 of the 200. I would want to match the proportions from dataset 1 so that PA made up&amp;nbsp; %10 of the sample of 100, NY would be 30% of the data, and DE would be 60% of the data. That would mean that PA I would select 10/40, NY I would pick 30/50, and DE 60/110.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I hope this makes sense what I am trying to do. Happy to clarify if what I am saying doesn't make sense.&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks so much for the help!&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 17 Oct 2023 19:55:18 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Create-random-sample-using-the-distribution-of-one-dataset-to/m-p/899015#M355352</guid>
      <dc:creator>Tommy1</dc:creator>
      <dc:date>2023-10-17T19:55:18Z</dc:date>
    </item>
    <item>
      <title>Re: Create random sample using the distribution of one dataset to randomly sample from another data</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Create-random-sample-using-the-distribution-of-one-dataset-to/m-p/899028#M355357</link>
      <description>&lt;P&gt;Proc Surveyselects assumes that you have some sort of population file that you want to select observations from.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I don't see a clear statement that you have such a data set.&lt;/P&gt;
&lt;P&gt;Do you want to select a COUNT or a PERCENTAGE of records? You get to pick one with surveyselect.&lt;/P&gt;
&lt;P&gt;If you specify a percentage (SAMPRATE) that will apply the percentage to a STRATA if any and is almost certainly not what you want. If your population set is large enough and you select a random sample of size N without any strata the result from surveyselect will be close to the proportion of any variable like state in the data. But likely won't be exact.&lt;/P&gt;
&lt;P&gt;SAMPSIZE will let specify an exact number for each strata that SAS will attempt to match if there is enough in the population.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;One example:&lt;/P&gt;
&lt;PRE&gt;/* create a populstion data set*/
data dummy;
  do i=1 to 10000;
     assign=rand('integer',3);
     select (assign);
        when(1) state='PA';
        when(2) state='NY';
        when(3) state='DE';
        otherwise;
     end;
     output;
  end;
run;
/* strata wants sorted data by the strata variable*/
proc sort data=dummy;
   by state;
run;
/*_nsize_ is specific keyword to specify 
   count of observations, other keywords
   for other criteria
   The strata variable(s) values must match that
   of the population set 
*/
data selectcontrol;
   input state :$2. _nsize_;
datalines;
PA 10
NY 30
DE 60
;
/* sort order fro the control data set 
   has to match the strata
*/
proc sort data=selectcontrol;
   by state;
run;


Proc surveyselect data=dummy out=selected
   sampsize=selectcontrol  /*&amp;lt;= this tells SAS to use the control set*/
;
strata state;
run;
;&lt;/PRE&gt;
&lt;P&gt;This would be a bit of overkill once you understand how the STRATA and SAMPSIZE options interact. You can specify a list of values for SAMPSIZE such as Sampsize=(60 30 10 ). The FIRST number is the number of observations to select from the first strata value, second from the second strata, third from the third ( and so on). This gets a bit more complicated with two strata variables to match the number to the combination. Look closely at your output data set from sorting the population.&lt;/P&gt;
&lt;P&gt;This will create a similar selection to the previous:&lt;/P&gt;
&lt;PRE&gt;Proc surveyselect data=dummy out=selected2
   sampsize=(60 10 30)
;
strata state;
run;&lt;/PRE&gt;
&lt;P&gt;If you want to be able to repeat the same selection with the exact same code specify the SEED= &amp;lt;some number between&amp;nbsp; and 32K&amp;gt;. Note that changing OS or SAS version is likely to &lt;STRONG&gt;not&lt;/STRONG&gt; duplicate the results because of other factors out of your control.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 17 Oct 2023 21:30:39 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Create-random-sample-using-the-distribution-of-one-dataset-to/m-p/899028#M355357</guid>
      <dc:creator>ballardw</dc:creator>
      <dc:date>2023-10-17T21:30:39Z</dc:date>
    </item>
    <item>
      <title>Re: Create random sample using the distribution of one dataset to randomly sample from another data</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Create-random-sample-using-the-distribution-of-one-dataset-to/m-p/899157#M355397</link>
      <description>&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/13884"&gt;@ballardw&lt;/a&gt;&amp;nbsp;Thank you so much for the speedy reply! In my effort to create a simplified example to explain what I am trying to do with the data, I didn't show that I do in fact have a population file.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;You explain this so well and make it so easy to understand. I feel like the documentation was so confusing that I couldn't figure out which options to use. I was able to modify your code to do exactly what I wanted. I am creating a ton of different versions for what I need to do and this allows me to make all those different versions.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thank you so much for your help!&lt;/P&gt;</description>
      <pubDate>Wed, 18 Oct 2023 16:58:20 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Create-random-sample-using-the-distribution-of-one-dataset-to/m-p/899157#M355397</guid>
      <dc:creator>Tommy1</dc:creator>
      <dc:date>2023-10-18T16:58:20Z</dc:date>
    </item>
  </channel>
</rss>

