DATA Step, Macro, Functions and more

How to split a table/dataset

Reply
New Contributor
Posts: 3

How to split a table/dataset

Hello,

I have a large table that I need to split into smaller ones.

The table is currently about 360,000 rows and I would like it split into tables of 100,000 rows or less.

The number of rows in the origional table varies so it could be larger or smaller at any given time.

I have read some info on the %split macro but I have not been able to get it to work as of yet...

Thank you for your help.

Super User
Posts: 19,814

Re: How to split a table/dataset

Why do you want to split the file, generally not recommended.

New Contributor
Posts: 3

Re: How to split a table/dataset

I need to split it for preformance/runtime issues.  The the conents of these tables are going to be run through a seperate macro...

Super User
Posts: 19,814

Re: How to split a table/dataset

~1 million records shouldn't be an issue for most processes.

Anyways explain how you'd like to split your data, with equal amounts, 100K each and the remaining in one data set, by a particular variable? What do you want name the outputs?

What is the range in the size of your data?

What %split macro are you referring to?

Here's a reference to a few different ways:

Split Data into Subsets - sasCommunity

New Contributor
Posts: 3

Re: How to split a table/dataset

I'd like tha data split by 100k with the remaining in its own data set.

So for this one it would be:

Main Table - 360,000 rows

into

Table1 100,000

Table2 100,000

Table3 100,000

Table4 60,000


I just need it split by the number of row, variables are not driving it.

I wouldnt expect the Main table to get larger than 500,000 rows.

i was looking into this:

http://www2.sas.com/proceedings/sugi28/075-28.pdf

But I could not get it working...i'm sure i am missing quite a few steps, but here is what i tried:

%split(TQ_AcctNum)/* creating a dataset with 100000 observations*/

data dsn;

do i= 1 to 100000;

output;

end;

run;

%macro split(dsn,splitby);

data _null_;

set &dsn nobs=num;

call symput('no_obs',num);

run;

%letno_obs=&no_obs;

%doi= 1 %to %sysfunc(ceil(&no_obs/&splitby));

data dsn&i;

set &dsn (firstobs=%sysfunc(floor(%eval((&i.-1)*&splitby.+1))) obs=%sysfunc(ceil(%eval(&i * &splitby.))));

run;

%end;

%mend;

%split(dsn,10000);

Super User
Posts: 19,814

Re: How to split a table/dataset

That works for me, you just have small typo, %doi=1, there needs to be space between the do and i.

Your sample call is missing the second parameter, the number of records per dataset. I also changed the call symput to symputx so you can remove trailing spaces automatically, no need for the second %let statement then.

%split(TQ_ACCNUM, 100000);

data Have;

do i= 1 to 54000;

output;

end;

run;

%macro split(dsn,splitby);

data _null_;

set &dsn nobs=num;

call symputx('no_obs',num);

run;

%put &no_obs;

%do i= 1 %to %sysfunc(ceil(&no_obs/&splitby));

data dsn&i;

set &dsn (firstobs=%sysfunc(floor(%eval((&i.-1)*&splitby.+1))) obs=%sysfunc(ceil(%eval(&i * &splitby.))));

run;

%end;

%mend;

%split(have,10000);

Respected Advisor
Posts: 3,156

Re: How to split a table/dataset

The following macro is based on Hash table(not vigorously tested), which natively supports dynamic splitting and supposedly more efficient (just one pass):

data Have;

do i= 1 to 54000;

K+1;

output;

end;

run;

%macro split(lib=work, dsn=have, size=10000, prefix=want);

proc sql NOPRINT;

select quote(strip(name))  into :vname separateD by ',' from dictionary.columns where LIBNAME="%upcase(&lib)" AND MEMNAME="%upcase(&dsn)";QUIT;

data _null_;

if _n_=1 then do;

if 0 then set have;

declare hash h(ordered:'a');

h.definekey('_n_');

h.definedata(&vname);

h.definedone();

end;

   set have end=last;

rc=h.add();

if last or mod(_n_,&size.) = 0 then do;

rc=h.output(dataset:"&prefix"||strip(_n_));

rc=h.clear();

     end;

run;

%mend;

%split(size=20000,prefix=test)

Haikuo

Super User
Posts: 7,782

Re: How to split a table/dataset

How many (and what type and size) columns do you have that you get runtime issues with 360.000 rows? I've rarely used more than several seconds to process this kind of datasets.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Ask a Question
Discussion stats
  • 7 replies
  • 1584 views
  • 1 like
  • 4 in conversation