BookmarkSubscribeRSS Feed
helloSAS
Obsidian | Level 7
Hello all,

I'm trying to split a 50,000 record file into 5 different files with 10,000 records in each file.

Can anyone tell me how to do this?

Thanks
5 REPLIES 5
SPR
Quartz | Level 8 SPR
Quartz | Level 8
Hello HelloSAS,

This is a solution:
[pre]
data i;
do i=1 to 50000;
output;
end;
run;
data r1 r2 r3 r4 r5;
set i;
if _n_ LE 10000 then output r1;
else if 10000 LT _n_ LE 20000 then output r2;
else if 20000 LT _n_ LE 30000 then output r3;
else if 30000 LT _n_ LE 40000 then output r4;
else if 40000 LT _n_ then output r5;
run;
[/pre]
Sincerely,
SPR
helloSAS
Obsidian | Level 7
Thank you for the response. I'm sorry but I wasnt very clear in my question above. Actually my end of file is not always 50,000 records. It can vary time to time.
My real need is, I have a huge file about 30 million records that I want to break into pieces probably in 10 pieces. I do not want to hard code record numbers as this number might increase from 30 millioin to 32 milion in very near future and so on. I would probably like to SAS automatically calculate its break points and break the file into 10 peices.

I was trying to do something as simple as this, but this is not what i exactly want.

data r1 r2;
set readxtra;
if _n_ LE 50000 then output r1;
else if _n_ GT 50000 then output r2;
run;
DBailey
Lapis Lazuli | Level 10
without regard to run times...or syntax for that matter...

proc sql;
select (1+count(*)/10) into :recs into from bigfile;
quit;

data
out1 out2 out3 out4 out5 out6 out7 out8 out9 out10;
set bigfile;
if _N_ < &recs then output out1;
else if _N_ < 2*&recs then output out2;
else if _N_ < 3 * &recs then output out3;
.
.
.
else output out10;
run;
helloSAS
Obsidian | Level 7
Thank you. Exactly what I'm looking for.
Peter_C
Rhodochrosite | Level 12
Like this technique!
but seek "less"

> without regard to run times...or syntax for that
> matter...
>
> proc sql;
> select (1+count(*)/10) into :recs into from bigfile;
> quit;
>
> data
> out1 out2 out3 out4 out5 out6 out7 out8 out9
> out10;
> set bigfile;
> if _N_ < &recs then output out1;
> else if _N_ < 2*&recs then output out2;
> else if _N_ < 3 * &recs then output out3;
> .
> .
> .
> else output out10;
> run;
[pre]
* peter approach ;
%macro genP( outs=10, prefix= peter_D, from= bigFile );
%local i ;
data %* generate the list of output data set names ;
%do i= 1 %to &outs ; &prefix.&i %end ;
;
%* derive the number of obs in each block (before last);
If _n_ = 1 then blocks + ceil( nobs/&outs ) ;
drop blocks ;

set &from nobs= nobs ;
%* now generate the lines that output to each data set;
%do i = 1 %to &outs ;
if _n_ LE blocks*&i then output &prefix.&i ; else
%end ;
put _all_/ 'E' "RROR: what's left!?" ; %* executed when &outs=0 ;
%put _user_ ;
run ;
%mend genP ;

option mprint nosymbolgen noMlogic ;
%genP( outs=3, prefix= class, from= sashelp.class ) [/pre] This seems to be fairly flexible so validation can be made on small sets before risking a test on the large data set.
My log from the above test shows the following MPRINT and notes[pre]MPRINT(GENP): data class1 class2 class3 ;
MPRINT(GENP): If _n_ = 1 then blocks + ceil( nobs/3 ) ;
MPRINT(GENP): drop blocks ;
MPRINT(GENP): set sashelp.class nobs= nobs ;
MPRINT(GENP): if _n_ LE blocks*1 then output class1 ;
MPRINT(GENP): else if _n_ LE blocks*2 then output class2 ;
MPRINT(GENP): else if _n_ LE blocks*3 then output class3 ;
MPRINT(GENP): else put _all_/ 'E' "RROR: what's left!?" ;
GENP OUTS 3
GENP I 4
GENP PREFIX class
GENP FROM sashelp.class
MPRINT(GENP): run ;

NOTE: There were 19 observations read from the data set SASHELP.CLASS.
NOTE: The data set WORK.CLASS1 has 7 observations and 5 variables.
NOTE: The data set WORK.CLASS2 has 7 observations and 5 variables.
NOTE: The data set WORK.CLASS3 has 5 observations and 5 variables.
NOTE: DATA statement used (Total process time):
real time 0.02 seconds
user cpu time 0.01 seconds
system cpu time 0.00 seconds[/pre]
think that achieves the result with re-useable process.
More interesting than dividing up by blocks of _N_, might be the (1+mod(_n_, &outs)) and random distribution among the output data sets,
but that did not seem to be a requirement.. (this time)

peterC

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 14332 views
  • 0 likes
  • 4 in conversation