<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: split file into multiple files in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/split-file-into-multiple-files/m-p/56461#M12051</link>
    <description>Thank you. Exactly what I'm looking for.</description>
    <pubDate>Mon, 25 Apr 2011 22:29:15 GMT</pubDate>
    <dc:creator>helloSAS</dc:creator>
    <dc:date>2011-04-25T22:29:15Z</dc:date>
    <item>
      <title>split file into multiple files</title>
      <link>https://communities.sas.com/t5/SAS-Programming/split-file-into-multiple-files/m-p/56457#M12047</link>
      <description>Hello all,&lt;BR /&gt;
&lt;BR /&gt;
I'm trying to split a 50,000 record file into 5 different files with 10,000 records in each file. &lt;BR /&gt;
&lt;BR /&gt;
Can anyone tell me how to do this?&lt;BR /&gt;
&lt;BR /&gt;
Thanks</description>
      <pubDate>Mon, 25 Apr 2011 19:59:06 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/split-file-into-multiple-files/m-p/56457#M12047</guid>
      <dc:creator>helloSAS</dc:creator>
      <dc:date>2011-04-25T19:59:06Z</dc:date>
    </item>
    <item>
      <title>Re: split file into multiple files</title>
      <link>https://communities.sas.com/t5/SAS-Programming/split-file-into-multiple-files/m-p/56458#M12048</link>
      <description>Hello HelloSAS,&lt;BR /&gt;
&lt;BR /&gt;
This is a solution:&lt;BR /&gt;
[pre]&lt;BR /&gt;
data i;&lt;BR /&gt;
  do i=1 to 50000;&lt;BR /&gt;
    output;&lt;BR /&gt;
  end;&lt;BR /&gt;
run;&lt;BR /&gt;
data r1 r2 r3 r4 r5;&lt;BR /&gt;
  set i;&lt;BR /&gt;
  if               _n_ LE 10000 then output r1;&lt;BR /&gt;
  else if 10000 LT _n_ LE 20000 then output r2;&lt;BR /&gt;
  else if 20000 LT _n_ LE 30000 then output r3;&lt;BR /&gt;
  else if 30000 LT _n_ LE 40000 then output r4;&lt;BR /&gt;
  else if 40000 LT _n_          then output r5;&lt;BR /&gt;
run;&lt;BR /&gt;
[/pre]&lt;BR /&gt;
Sincerely,&lt;BR /&gt;
SPR</description>
      <pubDate>Mon, 25 Apr 2011 20:22:10 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/split-file-into-multiple-files/m-p/56458#M12048</guid>
      <dc:creator>SPR</dc:creator>
      <dc:date>2011-04-25T20:22:10Z</dc:date>
    </item>
    <item>
      <title>Re: split file into multiple files</title>
      <link>https://communities.sas.com/t5/SAS-Programming/split-file-into-multiple-files/m-p/56459#M12049</link>
      <description>Thank you for the response. I'm sorry but I wasnt very clear in my question above. Actually my end of file is not always 50,000 records. It can vary time to time. &lt;BR /&gt;
My real need is, I have a huge file about 30 million records that I want to break into pieces probably in 10 pieces. I do not want to hard code record numbers as this number might increase from 30 millioin to 32 milion in very near future and so on. I would probably like to SAS automatically calculate its break points and break the file into 10 peices. &lt;BR /&gt;
&lt;BR /&gt;
I was trying to do something as simple as this, but this is not what i exactly want. &lt;BR /&gt;
&lt;BR /&gt;
data r1 r2;&lt;BR /&gt;
set readxtra;&lt;BR /&gt;
if _n_ LE 50000 then output r1;  &lt;BR /&gt;
else if _n_ GT 50000  then output r2;&lt;BR /&gt;
run;</description>
      <pubDate>Mon, 25 Apr 2011 21:10:15 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/split-file-into-multiple-files/m-p/56459#M12049</guid>
      <dc:creator>helloSAS</dc:creator>
      <dc:date>2011-04-25T21:10:15Z</dc:date>
    </item>
    <item>
      <title>Re: split file into multiple files</title>
      <link>https://communities.sas.com/t5/SAS-Programming/split-file-into-multiple-files/m-p/56460#M12050</link>
      <description>without regard to run times...or syntax for that matter...&lt;BR /&gt;
&lt;BR /&gt;
proc sql;&lt;BR /&gt;
select (1+count(*)/10) into :recs into from bigfile;&lt;BR /&gt;
quit;&lt;BR /&gt;
&lt;BR /&gt;
data &lt;BR /&gt;
  out1 out2 out3 out4 out5 out6 out7 out8 out9 out10;&lt;BR /&gt;
set bigfile;&lt;BR /&gt;
if _N_ &amp;lt; &amp;amp;recs then output out1;&lt;BR /&gt;
else if _N_ &amp;lt; 2*&amp;amp;recs then output out2;&lt;BR /&gt;
else if _N_ &amp;lt; 3 * &amp;amp;recs then output out3;&lt;BR /&gt;
.&lt;BR /&gt;
.&lt;BR /&gt;
.&lt;BR /&gt;
else output out10;&lt;BR /&gt;
run;</description>
      <pubDate>Mon, 25 Apr 2011 22:26:02 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/split-file-into-multiple-files/m-p/56460#M12050</guid>
      <dc:creator>DBailey</dc:creator>
      <dc:date>2011-04-25T22:26:02Z</dc:date>
    </item>
    <item>
      <title>Re: split file into multiple files</title>
      <link>https://communities.sas.com/t5/SAS-Programming/split-file-into-multiple-files/m-p/56461#M12051</link>
      <description>Thank you. Exactly what I'm looking for.</description>
      <pubDate>Mon, 25 Apr 2011 22:29:15 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/split-file-into-multiple-files/m-p/56461#M12051</guid>
      <dc:creator>helloSAS</dc:creator>
      <dc:date>2011-04-25T22:29:15Z</dc:date>
    </item>
    <item>
      <title>Re: split file into multiple files</title>
      <link>https://communities.sas.com/t5/SAS-Programming/split-file-into-multiple-files/m-p/56462#M12052</link>
      <description>Like this technique!&lt;BR /&gt;
but seek "less"&lt;BR /&gt;
&lt;BR /&gt;
&amp;gt; without regard to run times...or syntax for that&lt;BR /&gt;
&amp;gt; matter...&lt;BR /&gt;
&amp;gt; &lt;BR /&gt;
&amp;gt; proc sql;&lt;BR /&gt;
&amp;gt; select (1+count(*)/10) into :recs into from bigfile;&lt;BR /&gt;
&amp;gt; quit;&lt;BR /&gt;
&amp;gt; &lt;BR /&gt;
&amp;gt; data &lt;BR /&gt;
&amp;gt; out1 out2 out3 out4 out5 out6 out7 out8 out9&lt;BR /&gt;
&amp;gt;  out10;&lt;BR /&gt;
&amp;gt; set bigfile;&lt;BR /&gt;
&amp;gt; if _N_ &amp;lt; &amp;amp;recs then output out1;&lt;BR /&gt;
&amp;gt; else if _N_ &amp;lt; 2*&amp;amp;recs then output out2;&lt;BR /&gt;
&amp;gt; else if _N_ &amp;lt; 3 * &amp;amp;recs then output out3;&lt;BR /&gt;
&amp;gt; .&lt;BR /&gt;
&amp;gt; .&lt;BR /&gt;
&amp;gt; .&lt;BR /&gt;
&amp;gt; else output out10;&lt;BR /&gt;
&amp;gt; run;&lt;BR /&gt;
[pre]&lt;BR /&gt;
* peter approach ;&lt;BR /&gt;
%macro genP( outs=10, prefix= peter_D, from= bigFile ); &lt;BR /&gt;
%local i  ;&lt;BR /&gt;
 data  %* generate the list of output data set names ;&lt;BR /&gt;
%do i= 1 %to &amp;amp;outs ; &amp;amp;prefix.&amp;amp;i %end ;&lt;BR /&gt;
 ;&lt;BR /&gt;
 %* derive the number of obs in each block (before last);&lt;BR /&gt;
If _n_ = 1 then  blocks + ceil( nobs/&amp;amp;outs ) ;&lt;BR /&gt;
           drop  blocks ;&lt;BR /&gt;
  &lt;BR /&gt;
  set  &amp;amp;from  nobs= nobs ;&lt;BR /&gt;
 %* now generate the lines that output to each data set;&lt;BR /&gt;
               %do i = 1 %to &amp;amp;outs ;&lt;BR /&gt;
 if _n_ LE blocks*&amp;amp;i then output &amp;amp;prefix.&amp;amp;i ; else&lt;BR /&gt;
               %end ; &lt;BR /&gt;
 put _all_/ 'E' "RROR: what's left!?" ; %* executed when &amp;amp;outs=0 ;&lt;BR /&gt;
 %put _user_ ;&lt;BR /&gt;
run ;&lt;BR /&gt;
%mend  genP ;&lt;BR /&gt;
  &lt;BR /&gt;
option mprint nosymbolgen noMlogic ;&lt;BR /&gt;
%genP( outs=3, prefix= class, from= sashelp.class )  [/pre] This seems to be fairly flexible so validation can be made on small sets before risking a test on the large data set.&lt;BR /&gt;
My log from the above test shows the following MPRINT and notes[pre]MPRINT(GENP):   data class1 class2 class3 ;&lt;BR /&gt;
MPRINT(GENP):   If _n_ = 1 then blocks + ceil( nobs/3 ) ;&lt;BR /&gt;
MPRINT(GENP):   drop blocks ;&lt;BR /&gt;
MPRINT(GENP):   set sashelp.class nobs= nobs ;&lt;BR /&gt;
MPRINT(GENP):   if _n_ LE blocks*1 then output class1 ;&lt;BR /&gt;
MPRINT(GENP):   else if _n_ LE blocks*2 then output class2 ;&lt;BR /&gt;
MPRINT(GENP):   else if _n_ LE blocks*3 then output class3 ;&lt;BR /&gt;
MPRINT(GENP):   else put _all_/ 'E' "RROR: what's left!?" ;&lt;BR /&gt;
GENP OUTS 3&lt;BR /&gt;
GENP I 4&lt;BR /&gt;
GENP PREFIX class&lt;BR /&gt;
GENP FROM sashelp.class&lt;BR /&gt;
MPRINT(GENP):   run ;&lt;BR /&gt;
&lt;BR /&gt;
NOTE: There were 19 observations read from the data set SASHELP.CLASS.&lt;BR /&gt;
NOTE: The data set WORK.CLASS1 has 7 observations and 5 variables.&lt;BR /&gt;
NOTE: The data set WORK.CLASS2 has 7 observations and 5 variables.&lt;BR /&gt;
NOTE: The data set WORK.CLASS3 has 5 observations and 5 variables.&lt;BR /&gt;
NOTE: DATA statement used (Total process time):&lt;BR /&gt;
      real time           0.02 seconds&lt;BR /&gt;
      user cpu time       0.01 seconds&lt;BR /&gt;
      system cpu time     0.00 seconds[/pre]&lt;BR /&gt;
think that achieves the result with re-useable process.&lt;BR /&gt;
More interesting than dividing up by blocks of _N_, might be the (1+mod(_n_, &amp;amp;outs)) and random distribution among the output data sets,&lt;BR /&gt;
but that did not seem to be a requirement.. (this time)&lt;BR /&gt;
  &lt;BR /&gt;
peterC</description>
      <pubDate>Tue, 26 Apr 2011 16:14:11 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/split-file-into-multiple-files/m-p/56462#M12052</guid>
      <dc:creator>Peter_C</dc:creator>
      <dc:date>2011-04-26T16:14:11Z</dc:date>
    </item>
  </channel>
</rss>

