<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: How do I randomly split my dataset ib 4 parts like 10%, 20% , 30%, 40% in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/How-do-I-randomly-split-my-dataset-ib-4-parts-like-10-20-30-40/m-p/461052#M284767</link>
    <description>&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/18408"&gt;@Ksharp&lt;/a&gt;: I just ran a test with 3.8 million records. My brute force method selected 380,000, 760,000, 1,140,000 and 1,520,000 records for the four samples. The table method, in turn, selected 380,174, 760,300, 1,141,326 and 1,518,200 records for the four samples.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Art, CEO, AnalystFinder.com&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Wed, 09 May 2018 14:35:14 GMT</pubDate>
    <dc:creator>art297</dc:creator>
    <dc:date>2018-05-09T14:35:14Z</dc:date>
    <item>
      <title>How do I randomly split my dataset ib 4 parts like 10%, 20% , 30%, 40%</title>
      <link>https://communities.sas.com/t5/SAS-Programming/How-do-I-randomly-split-my-dataset-ib-4-parts-like-10-20-30-40/m-p/460958#M284755</link>
      <description>&lt;P&gt;Hi All,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I am trying to split my dataset&amp;nbsp;in 4 splits&amp;nbsp;like&amp;nbsp;10%, 20% , 30%, 40%&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Please help&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks&lt;/P&gt;&lt;P&gt;DInesh&lt;/P&gt;</description>
      <pubDate>Wed, 09 May 2018 11:34:58 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/How-do-I-randomly-split-my-dataset-ib-4-parts-like-10-20-30-40/m-p/460958#M284755</guid>
      <dc:creator>dinesh_ltjd2</dc:creator>
      <dc:date>2018-05-09T11:34:58Z</dc:date>
    </item>
    <item>
      <title>Re: How do I randomly split my dataset ib 4 parts like 10%, 20% , 30%, 40%</title>
      <link>https://communities.sas.com/t5/SAS-Programming/How-do-I-randomly-split-my-dataset-ib-4-parts-like-10-20-30-40/m-p/460961#M284756</link>
      <description>&lt;P&gt;Proc surveyselect is what you want:&lt;/P&gt;
&lt;PRE class="sascode"&gt;proc surveyselect data=Customers
   method=srs n=100 out=SampleSRS;
run;&lt;/PRE&gt;
&lt;P&gt;You can do various methods of selecting:&lt;/P&gt;
&lt;P&gt;&lt;A href="https://support.sas.com/documentation/cdl/en/statug/63033/HTML/default/viewer.htm#statug_surveyselect_sect007.htm" target="_blank"&gt;https://support.sas.com/documentation/cdl/en/statug/63033/HTML/default/viewer.htm#statug_surveyselect_sect007.htm&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 09 May 2018 11:39:18 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/How-do-I-randomly-split-my-dataset-ib-4-parts-like-10-20-30-40/m-p/460961#M284756</guid>
      <dc:creator>RW9</dc:creator>
      <dc:date>2018-05-09T11:39:18Z</dc:date>
    </item>
    <item>
      <title>Re: How do I randomly split my dataset ib 4 parts like 10%, 20% , 30%, 40%</title>
      <link>https://communities.sas.com/t5/SAS-Programming/How-do-I-randomly-split-my-dataset-ib-4-parts-like-10-20-30-40/m-p/460963#M284757</link>
      <description>&lt;P&gt;Thanks RW9!!&lt;/P&gt;&lt;P&gt;Will this support multiple values in sample size like&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;samprate = (0.10 0.20 0.30 0.40)&lt;/PRE&gt;</description>
      <pubDate>Wed, 09 May 2018 11:51:01 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/How-do-I-randomly-split-my-dataset-ib-4-parts-like-10-20-30-40/m-p/460963#M284757</guid>
      <dc:creator>dinesh_ltjd2</dc:creator>
      <dc:date>2018-05-09T11:51:01Z</dc:date>
    </item>
    <item>
      <title>Re: How do I randomly split my dataset ib 4 parts like 10%, 20% , 30%, 40%</title>
      <link>https://communities.sas.com/t5/SAS-Programming/How-do-I-randomly-split-my-dataset-ib-4-parts-like-10-20-30-40/m-p/460972#M284758</link>
      <description>&lt;P&gt;Apparently so:&lt;BR /&gt;&lt;A href="https://support.sas.com/documentation/cdl/en/statug/63347/HTML/default/viewer.htm#statug_surveyselect_sect023.htm" target="_blank"&gt;https://support.sas.com/documentation/cdl/en/statug/63347/HTML/default/viewer.htm#statug_surveyselect_sect023.htm&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;You will get one dataset with a variable for which group, and you can use that to by group processing on.&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 09 May 2018 12:20:49 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/How-do-I-randomly-split-my-dataset-ib-4-parts-like-10-20-30-40/m-p/460972#M284758</guid>
      <dc:creator>RW9</dc:creator>
      <dc:date>2018-05-09T12:20:49Z</dc:date>
    </item>
    <item>
      <title>Re: How do I randomly split my dataset ib 4 parts like 10%, 20% , 30%, 40%</title>
      <link>https://communities.sas.com/t5/SAS-Programming/How-do-I-randomly-split-my-dataset-ib-4-parts-like-10-20-30-40/m-p/460977#M284759</link>
      <description>&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data a b c d;
 set sashelp.air;
 call streaminit(123456780);
 n=rand('table',.1,.2,.3);
 if n=1 then output a;
  else if n=2 then output b;
   else if n=3 then output c;
    else output d;
drop n;
run;&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Wed, 09 May 2018 12:30:31 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/How-do-I-randomly-split-my-dataset-ib-4-parts-like-10-20-30-40/m-p/460977#M284759</guid>
      <dc:creator>Ksharp</dc:creator>
      <dc:date>2018-05-09T12:30:31Z</dc:date>
    </item>
    <item>
      <title>Re: How do I randomly split my dataset ib 4 parts like 10%, 20% , 30%, 40%</title>
      <link>https://communities.sas.com/t5/SAS-Programming/How-do-I-randomly-split-my-dataset-ib-4-parts-like-10-20-30-40/m-p/461021#M284760</link>
      <description>&lt;P&gt;I haven't compared this with proc surveyselect, but was intrigued with&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/18408"&gt;@Ksharp&lt;/a&gt;'s suggestion of using rand's table option.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Unfortunately, I didn't like the results it produced, as compared with taking matters in one's own hand. I'd suggest comparing the results of the following, as well as those obtained with proc surveyselect.&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data forsample;
  set sashelp.class;
  randnum=rand('uniform');
run;

proc sort data=forsample;
  by randnum;
run;

data asample10 asample20 asample30 asample40;
  set forsample nobs=n;
  if _n_ le round(n*.1) then output asample10;
  else if _n_ le round(n*.3) then output asample20;
  else if _n_ le round(n*.6) then output asample30;
  else output asample40;
run;
  
data bsample10 bsample20 bsample30 bsample40;
  set sashelp.class;
  n=rand('table',.1,.2,.3,.4);
  if n=1 then output bsample10;
  else if n=2 then output bsample20;
  else if n=3 then output bsample30;
  else output bsample40;
  drop n;
run;
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Art, CEO, AnalystFinder.com&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 09 May 2018 13:52:03 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/How-do-I-randomly-split-my-dataset-ib-4-parts-like-10-20-30-40/m-p/461021#M284760</guid>
      <dc:creator>art297</dc:creator>
      <dc:date>2018-05-09T13:52:03Z</dc:date>
    </item>
    <item>
      <title>Re: How do I randomly split my dataset ib 4 parts like 10%, 20% , 30%, 40%</title>
      <link>https://communities.sas.com/t5/SAS-Programming/How-do-I-randomly-split-my-dataset-ib-4-parts-like-10-20-30-40/m-p/461037#M284761</link>
      <description>&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/13711"&gt;@art297&lt;/a&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;What did you not like about the results of &amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/18408"&gt;@Ksharp&lt;/a&gt;'s suggestion?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 09 May 2018 14:18:32 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/How-do-I-randomly-split-my-dataset-ib-4-parts-like-10-20-30-40/m-p/461037#M284761</guid>
      <dc:creator>mkeintz</dc:creator>
      <dc:date>2018-05-09T14:18:32Z</dc:date>
    </item>
    <item>
      <title>Re: How do I randomly split my dataset ib 4 parts like 10%, 20% , 30%, 40%</title>
      <link>https://communities.sas.com/t5/SAS-Programming/How-do-I-randomly-split-my-dataset-ib-4-parts-like-10-20-30-40/m-p/461040#M284762</link>
      <description>I am still working on my code based on suggestions by &lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/18408"&gt;@Ksharp&lt;/a&gt; &amp;amp; &lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/13711"&gt;@art297&lt;/a&gt;...&lt;BR /&gt;&lt;BR /&gt;Thanks&lt;BR /&gt;DInesh</description>
      <pubDate>Wed, 09 May 2018 14:20:20 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/How-do-I-randomly-split-my-dataset-ib-4-parts-like-10-20-30-40/m-p/461040#M284762</guid>
      <dc:creator>dinesh_ltjd2</dc:creator>
      <dc:date>2018-05-09T14:20:20Z</dc:date>
    </item>
    <item>
      <title>Re: How do I randomly split my dataset ib 4 parts like 10%, 20% , 30%, 40%</title>
      <link>https://communities.sas.com/t5/SAS-Programming/How-do-I-randomly-split-my-dataset-ib-4-parts-like-10-20-30-40/m-p/461041#M284763</link>
      <description>&lt;P&gt;I think&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/13711"&gt;@art297&lt;/a&gt;&amp;nbsp;might say RAND('table') is not suited for small table .&lt;/P&gt;</description>
      <pubDate>Wed, 09 May 2018 14:21:09 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/How-do-I-randomly-split-my-dataset-ib-4-parts-like-10-20-30-40/m-p/461041#M284763</guid>
      <dc:creator>Ksharp</dc:creator>
      <dc:date>2018-05-09T14:21:09Z</dc:date>
    </item>
    <item>
      <title>Re: How do I randomly split my dataset ib 4 parts like 10%, 20% , 30%, 40%</title>
      <link>https://communities.sas.com/t5/SAS-Programming/How-do-I-randomly-split-my-dataset-ib-4-parts-like-10-20-30-40/m-p/461043#M284764</link>
      <description>My dataset is quite big with ~18MM observations&lt;BR /&gt;</description>
      <pubDate>Wed, 09 May 2018 14:22:19 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/How-do-I-randomly-split-my-dataset-ib-4-parts-like-10-20-30-40/m-p/461043#M284764</guid>
      <dc:creator>dinesh_ltjd2</dc:creator>
      <dc:date>2018-05-09T14:22:19Z</dc:date>
    </item>
    <item>
      <title>Re: How do I randomly split my dataset ib 4 parts like 10%, 20% , 30%, 40%</title>
      <link>https://communities.sas.com/t5/SAS-Programming/How-do-I-randomly-split-my-dataset-ib-4-parts-like-10-20-30-40/m-p/461047#M284765</link>
      <description>&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/31461"&gt;@mkeintz&lt;/a&gt;: I ran it a couple of times. In one sample10 had selected two obs and in the other it selected 0 obs. In those same two runs, sample20 selected 1 obs each time. I expected sample10 to always have at least one obs, and sample20 to have at least 3 obs.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Art, CEO, AnalystFinder.com&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 09 May 2018 14:26:56 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/How-do-I-randomly-split-my-dataset-ib-4-parts-like-10-20-30-40/m-p/461047#M284765</guid>
      <dc:creator>art297</dc:creator>
      <dc:date>2018-05-09T14:26:56Z</dc:date>
    </item>
    <item>
      <title>Re: How do I randomly split my dataset ib 4 parts like 10%, 20% , 30%, 40%</title>
      <link>https://communities.sas.com/t5/SAS-Programming/How-do-I-randomly-split-my-dataset-ib-4-parts-like-10-20-30-40/m-p/461051#M284766</link>
      <description>&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/18408"&gt;@Ksharp&lt;/a&gt;:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;If one want to guarantee exact sample proportions, then just update the ratios as samples are built.&amp;nbsp; Using your rand/table approach:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data a b c d;
 set sashelp.air  nobs=n_avail;
 call streaminit(123456780);
 if _n_=1 then do;
   array need {1:4} _temporary_;
   need{1} = round(.1*n_avail);
   need{2} = round(.2*n_avail);
   need{3} = round(.3*n_avail);
   need{4} = n_avail-sum(of need{*});
 end;

 n=rand('table',need{1}/n_avail,need{2}/n_avail,need{3}/n_avail);
 if n=1 then  output a; 
  else if n=2 then  output b; 
   else if n=3 then   output c;
    else output d;
 need{n}=need{n}-1;
 n_avail+(-1);
drop n;
run;&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Wed, 09 May 2018 14:34:59 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/How-do-I-randomly-split-my-dataset-ib-4-parts-like-10-20-30-40/m-p/461051#M284766</guid>
      <dc:creator>mkeintz</dc:creator>
      <dc:date>2018-05-09T14:34:59Z</dc:date>
    </item>
    <item>
      <title>Re: How do I randomly split my dataset ib 4 parts like 10%, 20% , 30%, 40%</title>
      <link>https://communities.sas.com/t5/SAS-Programming/How-do-I-randomly-split-my-dataset-ib-4-parts-like-10-20-30-40/m-p/461052#M284767</link>
      <description>&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/18408"&gt;@Ksharp&lt;/a&gt;: I just ran a test with 3.8 million records. My brute force method selected 380,000, 760,000, 1,140,000 and 1,520,000 records for the four samples. The table method, in turn, selected 380,174, 760,300, 1,141,326 and 1,518,200 records for the four samples.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Art, CEO, AnalystFinder.com&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 09 May 2018 14:35:14 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/How-do-I-randomly-split-my-dataset-ib-4-parts-like-10-20-30-40/m-p/461052#M284767</guid>
      <dc:creator>art297</dc:creator>
      <dc:date>2018-05-09T14:35:14Z</dc:date>
    </item>
    <item>
      <title>Re: How do I randomly split my dataset ib 4 parts like 10%, 20% , 30%, 40%</title>
      <link>https://communities.sas.com/t5/SAS-Programming/How-do-I-randomly-split-my-dataset-ib-4-parts-like-10-20-30-40/m-p/461054#M284768</link>
      <description>&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/31461"&gt;@mkeintz&lt;/a&gt;: That would work and definitely faster than first having to assign random numbers and then sort the file. However, I would have thought that rand's table option would already have such logic built in. Obviously, it doesn't!&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Art, CEO, AnalystFinder.com&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 09 May 2018 14:45:47 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/How-do-I-randomly-split-my-dataset-ib-4-parts-like-10-20-30-40/m-p/461054#M284768</guid>
      <dc:creator>art297</dc:creator>
      <dc:date>2018-05-09T14:45:47Z</dc:date>
    </item>
    <item>
      <title>Re: How do I randomly split my dataset ib 4 parts like 10%, 20% , 30%, 40%</title>
      <link>https://communities.sas.com/t5/SAS-Programming/How-do-I-randomly-split-my-dataset-ib-4-parts-like-10-20-30-40/m-p/461149#M284769</link>
      <description>&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/13711"&gt;@art297&lt;/a&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;EM&gt;&lt;STRONG&gt;"However, I would have thought that rand's table option would already have such logic built in."&lt;/STRONG&gt;&lt;/EM&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;But the RAND function is just a random number generator.&amp;nbsp; It doesn't know, and should not assume, that I need to update the probabilities as I progress through the dataset.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;It's essentially&amp;nbsp;sampling with replacement (&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/18408"&gt;@Ksharp&lt;/a&gt;'s original post) vs sampling without replacement per my suggestion.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 09 May 2018 18:04:00 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/How-do-I-randomly-split-my-dataset-ib-4-parts-like-10-20-30-40/m-p/461149#M284769</guid>
      <dc:creator>mkeintz</dc:creator>
      <dc:date>2018-05-09T18:04:00Z</dc:date>
    </item>
  </channel>
</rss>

