<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Splitting a dataset depending upon a variable value in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Splitting-a-dataset-depending-upon-a-variable-value/m-p/731441#M227835</link>
    <description>&lt;P&gt;Hello&lt;/P&gt;&lt;P&gt;I have a dataset with ~ 130K records. I want to pick a sample size of 11000 depending upon the city flag I have. City A will have 40% of records, City B will have 40% records and City C will have 20% records. (4400, 4400, 2200). I can use the obs method, but I want to randomly select these records. Is there a way in survey select to accomplish this, or is there any other method? Thanks in advance!&lt;/P&gt;</description>
    <pubDate>Mon, 05 Apr 2021 20:50:09 GMT</pubDate>
    <dc:creator>ARTI1</dc:creator>
    <dc:date>2021-04-05T20:50:09Z</dc:date>
    <item>
      <title>Splitting a dataset depending upon a variable value</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Splitting-a-dataset-depending-upon-a-variable-value/m-p/731441#M227835</link>
      <description>&lt;P&gt;Hello&lt;/P&gt;&lt;P&gt;I have a dataset with ~ 130K records. I want to pick a sample size of 11000 depending upon the city flag I have. City A will have 40% of records, City B will have 40% records and City C will have 20% records. (4400, 4400, 2200). I can use the obs method, but I want to randomly select these records. Is there a way in survey select to accomplish this, or is there any other method? Thanks in advance!&lt;/P&gt;</description>
      <pubDate>Mon, 05 Apr 2021 20:50:09 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Splitting-a-dataset-depending-upon-a-variable-value/m-p/731441#M227835</guid>
      <dc:creator>ARTI1</dc:creator>
      <dc:date>2021-04-05T20:50:09Z</dc:date>
    </item>
    <item>
      <title>Re: Splitting a dataset depending upon a variable value</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Splitting-a-dataset-depending-upon-a-variable-value/m-p/731446#M227836</link>
      <description>&lt;P&gt;Hi &lt;A class="trigger-hovercard" style="color: #007dc3;" href="https://communities.sas.com/t5/user/viewprofilepage/user-id/192517" target="_blank"&gt;ARTI1&lt;/A&gt;,&lt;/P&gt;
&lt;P&gt;It's not clear whether you want to split data set based on variable value or sample size. If you want to split based on sample size, please read this blog post: &lt;A href="https://blogs.sas.com/content/sgf/2020/07/23/splitting-a-data-set-into-smaller-data-sets/" target="_self"&gt;Splitting a data set into smaller data sets&lt;/A&gt; .&lt;/P&gt;
&lt;P&gt;In there, see section "Splitting a data set into smaller data sets randomly".&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;If you want to split data set based on the values of categorical variables, see &lt;A href="https://blogs.sas.com/content/sasdummy/2015/01/26/how-to-split-one-data-set-into-many/" target="_blank"&gt;How to split one data set into many&lt;/A&gt; blog post.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Hope this helps.&lt;/P&gt;</description>
      <pubDate>Mon, 05 Apr 2021 21:03:44 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Splitting-a-dataset-depending-upon-a-variable-value/m-p/731446#M227836</guid>
      <dc:creator>LeonidBatkhan</dc:creator>
      <dc:date>2021-04-05T21:03:44Z</dc:date>
    </item>
    <item>
      <title>Re: Splitting a dataset depending upon a variable value</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Splitting-a-dataset-depending-upon-a-variable-value/m-p/731450#M227838</link>
      <description>&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/192517"&gt;@ARTI1&lt;/a&gt;&amp;nbsp;wrote:&lt;BR /&gt;
&lt;P&gt;Hello&lt;/P&gt;
&lt;P&gt;I have a dataset with ~ 130K records. I want to pick a sample size of 11000 depending upon the city flag I have. City A will have 40% of records, City B will have 40% records and City C will have 20% records. (4400, 4400, 2200). I can use the obs method, but I want to randomly select these records. Is there a way in survey select to accomplish this, or is there any other method? Thanks in advance!&lt;/P&gt;
&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;This sounds to me like a stratified sample issue. So something like;&lt;/P&gt;
&lt;PRE&gt;Proc sort data=have;
   by city;
run;

proc surveyselect data=have out=selected
   samprate=(40 40 20);
   strata city;
run;&lt;/PRE&gt;
&lt;P&gt;would be a generic approach. There may be issues with the order of samprate values. They need to match the &lt;STRONG&gt;sorted order&lt;/STRONG&gt; of the stratification variable. So if your "city" values are character the alphabetical value of the name needs to match the samprate. The first sorted name will be selected at 40%, the second at 40% and the third at 20%. OR you could specify SAMPSIZE = (4400 4400 2200) which may be preferable as the rates sometimes will be off by a couple if you need an exact number.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;The sort is needed to use the strata approach. Of course use your data set and actual variable names.&lt;/P&gt;</description>
      <pubDate>Mon, 05 Apr 2021 21:32:10 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Splitting-a-dataset-depending-upon-a-variable-value/m-p/731450#M227838</guid>
      <dc:creator>ballardw</dc:creator>
      <dc:date>2021-04-05T21:32:10Z</dc:date>
    </item>
  </channel>
</rss>

