Solved: Splitting a SAS table into multiple tables bases on number of rows

SasStatistics · Posted 04-21-2022 09:20 AM

In reality I have a SAS table having very many rows, think milions, and I want to split it into sub-tables. The reason for this is that I will export the data and those who will process it use a system that cannot process so many rows at once, so it has to be split.

For clarity I will provide an demonstrative example that would solve my problems.

Assume I have a SAS table consisting of 334 rows.

Now I want to split this table into 3 tables, since 334/3 is 111.3333 the function would return:
- table1 consisting of 111 rows.

- table2 consisting of 111 rows.

- table3 consisting of 112 rows.

So I just say that I want 3 tables and then it puts the suitable amount of observations in each new table (111, 111 and 112 rows in this case).

Any ideas on how to achieve this?

Ksharp · Posted 04-21-2022 09:32 AM



proc surveyselect data=sashelp.heart groups=3 out=temp;
run;

data one two three;
 set temp;
 if GroupID=1 then output one;
 if GroupID=2 then output two;
 if GroupID=3 then output three;
run;

View solution in original post

tarheel13 · Posted 04-21-2022 09:30 AM

just make a counter variable and use that. you could do if 1 <= count <= 111 then output table1. else if count 111 < count <= 223 then output table2; else if count > 233 then output table3;

Ksharp · Posted 04-21-2022 09:32 AM



proc surveyselect data=sashelp.heart groups=3 out=temp;
run;

data one two three;
 set temp;
 if GroupID=1 then output one;
 if GroupID=2 then output two;
 if GroupID=3 then output three;
run;

Holmes · Posted 12-09-2022 03:36 PM

@Ksharp escreveu:

proc surveyselect data=sashelp.heart groups=3 out=temp;
run;

data one two three;
 set temp;
 if GroupID=1 then output one;
 if GroupID=2 then output two;
 if GroupID=3 then output three;
run;

Can I use this to make a PROC SQL split a query?

I have a long-lasting query and want to split the results in one million rows per time.

How can I do this?

Ksharp · Posted 12-10-2022 02:35 PM

Please start a brand new session, this question is almost three years ago.
" want to split the results in one million rows per time. “
You need make id variable.Like:
data have;
set have;
id+1;
run;

proc sql;
create table part1 as
select * from have where id in (1:1000000);
............
quit;

LinusH · Posted 04-21-2022 09:34 AM

First check the no of records in the table. This can be achieved by PROC DATASETS/CONTENTS, data file functions and PROC SQL DICTIONARAY.TABLES.

Calculate the intervals, and put them into macro variables.

Use a data step, and explicitly output to your target tables depending on the value of the automatic _n_ variable compared with the macro variables above.

Data never sleeps

Patrick · Posted 04-21-2022 09:42 AM

IF you can fit a single slice of your table (&max_rows_per_table) into memory then below coding approach should work.

%let max_rows_per_table=5;

data have;
  set sashelp.class;
run;

data _null_;
  if _n_=1 then
    do;
      dcl hash h1(dataset:'have(obs=0)',multidata:'y');
      h1.defineKey('sex');
      h1.defineData(all:'y');
      h1.defineDone();
    end;
  set have end=last;
  h1.add();
  if mod(_n_,&max_rows_per_table)=0 or last then
    do;
      h1.output(dataset:cats('work.want_',put(_n_,z10.)));
      h1.clear();
    end;
run;

proc sql;
  select *
  from dictionary.tables
  where libname='WORK' and memname like 'WANT%'
  ;
quit;

The logic for naming the table slices could of course be different - like 1, 2, 3...

In the hash definition for h1.defineKey('<some variable>'); you can select any column that exists in your source table - but ideally choose the one with the lowest length as this will save some memory.

Using hash tables is the only way I know of that allows you to create output tables dynamically during execution time. Any other approach will require some pre-processing to define the output tables (not that hard to do but will require some additional coding to generate the required syntax dynamically).

ballardw · Posted 04-21-2022 10:36 AM

No need to create data sets, just select the records desired from the source data set.

Pseudo code example to export records to different files in groups of 100:

proc export data=have (firstobs=1 obs=100)
    outfile=<your first file>
    <other options>
;
run;

proc export data=have (firstobs=101 obs=200)
    outfile=<your 2nd file>
    <other options>
;
run;

So you can use something like this to build the observation selection bit and a filename to plug into CALL EXECUTE code with boiler plate for the proc export options. Or create strings and write to a text program file to have documentation of what was created and run.

data _null_;
   do countobs = 1 to 3025 by 300;
      groupcount+1;
      string= catx(' ',"(firstobs=",countobs," obs=",groupcount*300,')');
      put string=;
      filename = cats("<path>\filebasename",put(groupcount,z3.),'.extension');
      put filename=;
   end;
run;

Michael_Harper · Posted 04-21-2022 01:02 PM

data one two three;
set SASHELP.baseball;

which= mod(_n_,3);
if which= 0 then output one;
else do;
if which < 2 then output two;
else output three;
end;
drop which;
run;

Michael Joe Harper

PaigeMiller · Posted 04-21-2022 01:42 PM

A solution using CALL EXECUTE for the data set SASHELP.CARS

proc sql; /* Find number of observations in data set */
    select nobs into :nobs from sashelp.vtable where libname='SASHELP' and memname="CARS";
quit;
%let n_splits=5; /* User specifies number of splits desired */

data _null_;
    nobs_per_split=&nobs/&n_splits;
    do i=1 to &n_splits;
        start=round((i-1)*nobs_per_split+1);
        end=round(i*nobs_per_split);
        call execute(cat('proc export data=sashelp.cars(firstobs=',start,' obs=',end,') outfile="table',i,'.csv" dbms=csv; run;'));
    end;
run;

--
Paige Miller

Kurt_Bremser · Posted 04-21-2022 01:45 PM

No need to split into datasets. Just use the FILEVAR= option of the FILE statement in the DATA step that does the export.

In which file format do you export your data?

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Splitting a SAS table into multiple tables bases on number of rows

Re: Splitting a SAS table into multiple tables bases on number of rows

Re: Splitting a SAS table into multiple tables bases on number of rows

Re: Splitting a SAS table into multiple tables bases on number of rows

Re: Splitting a SAS table into multiple tables bases on number of rows

Re: Splitting a SAS table into multiple tables bases on number of rows

Re: Splitting a SAS table into multiple tables bases on number of rows

Re: Splitting a SAS table into multiple tables bases on number of rows

Re: Splitting a SAS table into multiple tables bases on number of rows

Re: Splitting a SAS table into multiple tables bases on number of rows

Re: Splitting a SAS table into multiple tables bases on number of rows

Re: Splitting a SAS table into multiple tables bases on number of rows

Registration is open

SAS Training: Just a Click Away