BookmarkSubscribeRSS Feed
s_stef
Calcite | Level 5

Hello,

I have a large table that I need to split into smaller ones.

The table is currently about 360,000 rows and I would like it split into tables of 100,000 rows or less.

The number of rows in the origional table varies so it could be larger or smaller at any given time.

I have read some info on the %split macro but I have not been able to get it to work as of yet...

Thank you for your help.

7 REPLIES 7
Reeza
Super User

Why do you want to split the file, generally not recommended.

s_stef
Calcite | Level 5

I need to split it for preformance/runtime issues.  The the conents of these tables are going to be run through a seperate macro...

Reeza
Super User

~1 million records shouldn't be an issue for most processes.

Anyways explain how you'd like to split your data, with equal amounts, 100K each and the remaining in one data set, by a particular variable? What do you want name the outputs?

What is the range in the size of your data?

What %split macro are you referring to?

Here's a reference to a few different ways:

Split Data into Subsets - sasCommunity

s_stef
Calcite | Level 5

I'd like tha data split by 100k with the remaining in its own data set.

So for this one it would be:

Main Table - 360,000 rows

into

Table1 100,000

Table2 100,000

Table3 100,000

Table4 60,000


I just need it split by the number of row, variables are not driving it.

I wouldnt expect the Main table to get larger than 500,000 rows.

i was looking into this:

http://www2.sas.com/proceedings/sugi28/075-28.pdf

But I could not get it working...i'm sure i am missing quite a few steps, but here is what i tried:

%split(TQ_AcctNum)/* creating a dataset with 100000 observations*/

data dsn;

do i= 1 to 100000;

output;

end;

run;

%macro split(dsn,splitby);

data _null_;

set &dsn nobs=num;

call symput('no_obs',num);

run;

%letno_obs=&no_obs;

%doi= 1 %to %sysfunc(ceil(&no_obs/&splitby));

data dsn&i;

set &dsn (firstobs=%sysfunc(floor(%eval((&i.-1)*&splitby.+1))) obs=%sysfunc(ceil(%eval(&i * &splitby.))));

run;

%end;

%mend;

%split(dsn,10000);

Reeza
Super User

That works for me, you just have small typo, %doi=1, there needs to be space between the do and i.

Your sample call is missing the second parameter, the number of records per dataset. I also changed the call symput to symputx so you can remove trailing spaces automatically, no need for the second %let statement then.

%split(TQ_ACCNUM, 100000);

data Have;

do i= 1 to 54000;

output;

end;

run;

%macro split(dsn,splitby);

data _null_;

set &dsn nobs=num;

call symputx('no_obs',num);

run;

%put &no_obs;

%do i= 1 %to %sysfunc(ceil(&no_obs/&splitby));

data dsn&i;

set &dsn (firstobs=%sysfunc(floor(%eval((&i.-1)*&splitby.+1))) obs=%sysfunc(ceil(%eval(&i * &splitby.))));

run;

%end;

%mend;

%split(have,10000);

Haikuo
Onyx | Level 15

The following macro is based on Hash table(not vigorously tested), which natively supports dynamic splitting and supposedly more efficient (just one pass):

data Have;

do i= 1 to 54000;

K+1;

output;

end;

run;

%macro split(lib=work, dsn=have, size=10000, prefix=want);

proc sql NOPRINT;

select quote(strip(name))  into :vname separateD by ',' from dictionary.columns where LIBNAME="%upcase(&lib)" AND MEMNAME="%upcase(&dsn)";QUIT;

data _null_;

if _n_=1 then do;

if 0 then set have;

declare hash h(ordered:'a');

h.definekey('_n_');

h.definedata(&vname);

h.definedone();

end;

   set have end=last;

rc=h.add();

if last or mod(_n_,&size.) = 0 then do;

rc=h.output(dataset:"&prefix"||strip(_n_));

rc=h.clear();

     end;

run;

%mend;

%split(size=20000,prefix=test)

Haikuo

Kurt_Bremser
Super User

How many (and what type and size) columns do you have that you get runtime issues with 360.000 rows? I've rarely used more than several seconds to process this kind of datasets.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 6261 views
  • 2 likes
  • 4 in conversation