Like this technique!
but seek "less"
> without regard to run times...or syntax for that
> matter...
>
> proc sql;
> select (1+count(*)/10) into :recs into from bigfile;
> quit;
>
> data
> out1 out2 out3 out4 out5 out6 out7 out8 out9
> out10;
> set bigfile;
> if _N_ < &recs then output out1;
> else if _N_ < 2*&recs then output out2;
> else if _N_ < 3 * &recs then output out3;
> .
> .
> .
> else output out10;
> run;
[pre]
* peter approach ;
%macro genP( outs=10, prefix= peter_D, from= bigFile );
%local i ;
data %* generate the list of output data set names ;
%do i= 1 %to &outs ; &prefix.&i %end ;
;
%* derive the number of obs in each block (before last);
If _n_ = 1 then blocks + ceil( nobs/&outs ) ;
drop blocks ;
set &from nobs= nobs ;
%* now generate the lines that output to each data set;
%do i = 1 %to &outs ;
if _n_ LE blocks*&i then output &prefix.&i ; else
%end ;
put _all_/ 'E' "RROR: what's left!?" ; %* executed when &outs=0 ;
%put _user_ ;
run ;
%mend genP ;
option mprint nosymbolgen noMlogic ;
%genP( outs=3, prefix= class, from= sashelp.class ) [/pre] This seems to be fairly flexible so validation can be made on small sets before risking a test on the large data set.
My log from the above test shows the following MPRINT and notes[pre]MPRINT(GENP): data class1 class2 class3 ;
MPRINT(GENP): If _n_ = 1 then blocks + ceil( nobs/3 ) ;
MPRINT(GENP): drop blocks ;
MPRINT(GENP): set sashelp.class nobs= nobs ;
MPRINT(GENP): if _n_ LE blocks*1 then output class1 ;
MPRINT(GENP): else if _n_ LE blocks*2 then output class2 ;
MPRINT(GENP): else if _n_ LE blocks*3 then output class3 ;
MPRINT(GENP): else put _all_/ 'E' "RROR: what's left!?" ;
GENP OUTS 3
GENP I 4
GENP PREFIX class
GENP FROM sashelp.class
MPRINT(GENP): run ;
NOTE: There were 19 observations read from the data set SASHELP.CLASS.
NOTE: The data set WORK.CLASS1 has 7 observations and 5 variables.
NOTE: The data set WORK.CLASS2 has 7 observations and 5 variables.
NOTE: The data set WORK.CLASS3 has 5 observations and 5 variables.
NOTE: DATA statement used (Total process time):
real time 0.02 seconds
user cpu time 0.01 seconds
system cpu time 0.00 seconds[/pre]
think that achieves the result with re-useable process.
More interesting than dividing up by blocks of _N_, might be the (1+mod(_n_, &outs)) and random distribution among the output data sets,
but that did not seem to be a requirement.. (this time)
peterC