<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: split dataset into n folds in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/split-dataset-into-n-folds/m-p/420519#M103503</link>
    <description>Thanks. I thought about using PROC SURVEYSELECT. Can you please make an example for 3 folds? source dataset: Have, output datasets: wants1, wants2, wants3? Thanks</description>
    <pubDate>Tue, 12 Dec 2017 16:50:47 GMT</pubDate>
    <dc:creator>csetzkorn</dc:creator>
    <dc:date>2017-12-12T16:50:47Z</dc:date>
    <item>
      <title>split dataset into n folds</title>
      <link>https://communities.sas.com/t5/SAS-Programming/split-dataset-into-n-folds/m-p/420508#M103499</link>
      <description>&lt;P&gt;I would like to split a given dataset into n stratified equal sized-ish folds by amending it with an additional column containing n. What is a common/simple way to achieve this? Thanks.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 12 Dec 2017 15:58:41 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/split-dataset-into-n-folds/m-p/420508#M103499</guid>
      <dc:creator>csetzkorn</dc:creator>
      <dc:date>2017-12-12T15:58:41Z</dc:date>
    </item>
    <item>
      <title>Re: split dataset into n folds</title>
      <link>https://communities.sas.com/t5/SAS-Programming/split-dataset-into-n-folds/m-p/420511#M103500</link>
      <description>&lt;P&gt;Generally, it's not a good idea.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;That being said, here's two write ups on it.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;1.&amp;nbsp;&lt;A href="http://www.sascommunity.org/wiki/Split_Data_into_Subsets" target="_blank"&gt;http://www.sascommunity.org/wiki/Split_Data_into_Subsets&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;2.&amp;nbsp;&lt;A href="https://blogs.sas.com/content/sasdummy/2015/01/26/how-to-split-one-data-set-into-many/" target="_blank"&gt;https://blogs.sas.com/content/sasdummy/2015/01/26/how-to-split-one-data-set-into-many/&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 12 Dec 2017 16:09:29 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/split-dataset-into-n-folds/m-p/420511#M103500</guid>
      <dc:creator>Reeza</dc:creator>
      <dc:date>2017-12-12T16:09:29Z</dc:date>
    </item>
    <item>
      <title>Re: split dataset into n folds</title>
      <link>https://communities.sas.com/t5/SAS-Programming/split-dataset-into-n-folds/m-p/420513#M103501</link>
      <description>I came across these references when I googled. They do not seem to split the original dataset randomly but rather based of column values. I would like to split randomly never mind stratification.</description>
      <pubDate>Tue, 12 Dec 2017 16:28:55 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/split-dataset-into-n-folds/m-p/420513#M103501</guid>
      <dc:creator>csetzkorn</dc:creator>
      <dc:date>2017-12-12T16:28:55Z</dc:date>
    </item>
    <item>
      <title>Re: split dataset into n folds</title>
      <link>https://communities.sas.com/t5/SAS-Programming/split-dataset-into-n-folds/m-p/420515#M103502</link>
      <description>&lt;P&gt;PROC SURVEYSELECT then? Choose N samples of X data?&lt;/P&gt;</description>
      <pubDate>Tue, 12 Dec 2017 16:30:35 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/split-dataset-into-n-folds/m-p/420515#M103502</guid>
      <dc:creator>Reeza</dc:creator>
      <dc:date>2017-12-12T16:30:35Z</dc:date>
    </item>
    <item>
      <title>Re: split dataset into n folds</title>
      <link>https://communities.sas.com/t5/SAS-Programming/split-dataset-into-n-folds/m-p/420519#M103503</link>
      <description>Thanks. I thought about using PROC SURVEYSELECT. Can you please make an example for 3 folds? source dataset: Have, output datasets: wants1, wants2, wants3? Thanks</description>
      <pubDate>Tue, 12 Dec 2017 16:50:47 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/split-dataset-into-n-folds/m-p/420519#M103503</guid>
      <dc:creator>csetzkorn</dc:creator>
      <dc:date>2017-12-12T16:50:47Z</dc:date>
    </item>
    <item>
      <title>Re: split dataset into n folds</title>
      <link>https://communities.sas.com/t5/SAS-Programming/split-dataset-into-n-folds/m-p/420521#M103504</link>
      <description>&lt;P&gt;It won't create multiple data sets but will do the random selection. Then you can use the methods above to split.&lt;/P&gt;
&lt;P&gt;Or the manual way of adding a random numbers, sort by random number and use any of the methods in the link above.&lt;/P&gt;</description>
      <pubDate>Tue, 12 Dec 2017 16:57:41 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/split-dataset-into-n-folds/m-p/420521#M103504</guid>
      <dc:creator>Reeza</dc:creator>
      <dc:date>2017-12-12T16:57:41Z</dc:date>
    </item>
    <item>
      <title>Re: split dataset into n folds</title>
      <link>https://communities.sas.com/t5/SAS-Programming/split-dataset-into-n-folds/m-p/420530#M103505</link>
      <description>&lt;P&gt;No guarantee about randomish result but&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;data want;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; set have;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; split = mod(_n_, 9);&lt;/P&gt;
&lt;P&gt;run;&lt;/P&gt;
&lt;P&gt;will add a variable that will split the data set into in 9 parts and the size difference will be plus/minus 1 between any groups. Replace 9 with your desired number.&lt;/P&gt;
&lt;P&gt;If randomization critical then add a variable the result of a&amp;nbsp;random number function, sort by that variable and then use the method above.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 12 Dec 2017 17:24:13 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/split-dataset-into-n-folds/m-p/420530#M103505</guid>
      <dc:creator>ballardw</dc:creator>
      <dc:date>2017-12-12T17:24:13Z</dc:date>
    </item>
    <item>
      <title>Re: split dataset into n folds</title>
      <link>https://communities.sas.com/t5/SAS-Programming/split-dataset-into-n-folds/m-p/420534#M103507</link>
      <description>That's fair enough but can you please show code that creates computed column with 1, 2 and 3 in it to indicate fold (see original question).</description>
      <pubDate>Tue, 12 Dec 2017 17:32:04 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/split-dataset-into-n-folds/m-p/420534#M103507</guid>
      <dc:creator>csetzkorn</dc:creator>
      <dc:date>2017-12-12T17:32:04Z</dc:date>
    </item>
    <item>
      <title>Re: split dataset into n folds</title>
      <link>https://communities.sas.com/t5/SAS-Programming/split-dataset-into-n-folds/m-p/420535#M103508</link>
      <description>&lt;P&gt;You should have enough information and samples here to write the sample code yourself or at minimum provide sample data&lt;/P&gt;</description>
      <pubDate>Tue, 12 Dec 2017 17:32:49 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/split-dataset-into-n-folds/m-p/420535#M103508</guid>
      <dc:creator>Reeza</dc:creator>
      <dc:date>2017-12-12T17:32:49Z</dc:date>
    </item>
    <item>
      <title>Re: split dataset into n folds</title>
      <link>https://communities.sas.com/t5/SAS-Programming/split-dataset-into-n-folds/m-p/420536#M103509</link>
      <description>Thanks randomness is important</description>
      <pubDate>Tue, 12 Dec 2017 17:32:49 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/split-dataset-into-n-folds/m-p/420536#M103509</guid>
      <dc:creator>csetzkorn</dc:creator>
      <dc:date>2017-12-12T17:32:49Z</dc:date>
    </item>
    <item>
      <title>Re: split dataset into n folds</title>
      <link>https://communities.sas.com/t5/SAS-Programming/split-dataset-into-n-folds/m-p/420772#M103544</link>
      <description>&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data have;
 set sashelp.heart;
 call streaminit(12345678);
 random=rand('uniform');
run;

proc rank data=have out=temp groups=3;
var random;
ranks group;
run;
data want1 want2 want3;
 set temp;
 if group=0 then output want1;
  else  if group=1 then output want2;
   else output want3;
run;&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Wed, 13 Dec 2017 13:17:48 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/split-dataset-into-n-folds/m-p/420772#M103544</guid>
      <dc:creator>Ksharp</dc:creator>
      <dc:date>2017-12-13T13:17:48Z</dc:date>
    </item>
    <item>
      <title>Re: split dataset into n folds</title>
      <link>https://communities.sas.com/t5/SAS-Programming/split-dataset-into-n-folds/m-p/420959#M103572</link>
      <description>&lt;P&gt;Using call rantbl, with regular updating of table probabilities will allow a single-step solution, creating the new variable subgroup.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data want (drop=_:);
  set have nobs=nrecs;

  array needed{10} _temporary_;
  array needprob{10} _temporary_;

  if _n_=1 then do;
    do _I=1 to dim(needed); 
      needed{_I}=floor(nrecs/dim(needed));
    end;
    do _I=1 to dim(needed) while (sum(of needed{*})&amp;lt;nrecs);
      needed{_I}=needed{_I}+1;
    end;
  end;

  _nleft = nrecs-(_n_-1);
  do _I=1 to dim(needed);
    needprob{_I}=needed{_I}/_nleft;
  end;

  seed=1250666;
  call rantbl(_seed,of needprob{*},subgroup);

  needed{subgroup}=needed{subgroup}-1;
run;
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Notes:&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;Changing the dimension of arrays NEEDED and NEEDPROB is all that's required to change the number of randomly populated subgroups.&lt;/LI&gt;
&lt;LI&gt;NEEDED tracks, for each subgroup, the number of observations yet to be added.&amp;nbsp; It's dynamically updated with every incoming observations.&amp;nbsp;&amp;nbsp; The minimum and maximum starting values for NEEDED will differ by no more than one, and will start out summing to NRECS.&lt;/LI&gt;
&lt;LI&gt;NEEDPROB array is required by the CALL RANTBL routine.&amp;nbsp; It uses elements of NEEDED divided by the number of observations remaining to be assigned.&lt;/LI&gt;
&lt;/OL&gt;</description>
      <pubDate>Wed, 13 Dec 2017 19:25:57 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/split-dataset-into-n-folds/m-p/420959#M103572</guid>
      <dc:creator>mkeintz</dc:creator>
      <dc:date>2017-12-13T19:25:57Z</dc:date>
    </item>
  </channel>
</rss>

