In reality I have a SAS table having very many rows, think milions, and I want to split it into sub-tables. The reason for this is that I will export the data and those who will process it use a system that cannot process so many rows at once, so it has to be split.
For clarity I will provide an demonstrative example that would solve my problems.
Assume I have a SAS table consisting of 334 rows.
Now I want to split this table into 3 tables, since 334/3 is 111.3333 the function would return:
- table1 consisting of 111 rows.
- table2 consisting of 111 rows.
- table3 consisting of 112 rows.
So I just say that I want 3 tables and then it puts the suitable amount of observations in each new table (111, 111 and 112 rows in this case).
Any ideas on how to achieve this?
proc surveyselect data=sashelp.heart groups=3 out=temp;
run;
data one two three;
set temp;
if GroupID=1 then output one;
if GroupID=2 then output two;
if GroupID=3 then output three;
run;
just make a counter variable and use that. you could do if 1 <= count <= 111 then output table1. else if count 111 < count <= 223 then output table2; else if count > 233 then output table3;
proc surveyselect data=sashelp.heart groups=3 out=temp;
run;
data one two three;
set temp;
if GroupID=1 then output one;
if GroupID=2 then output two;
if GroupID=3 then output three;
run;
@Ksharp escreveu:proc surveyselect data=sashelp.heart groups=3 out=temp; run; data one two three; set temp; if GroupID=1 then output one; if GroupID=2 then output two; if GroupID=3 then output three; run;
Can I use this to make a PROC SQL split a query?
I have a long-lasting query and want to split the results in one million rows per time.
How can I do this?
First check the no of records in the table. This can be achieved by PROC DATASETS/CONTENTS, data file functions and PROC SQL DICTIONARAY.TABLES.
Calculate the intervals, and put them into macro variables.
Use a data step, and explicitly output to your target tables depending on the value of the automatic _n_ variable compared with the macro variables above.
IF you can fit a single slice of your table (&max_rows_per_table) into memory then below coding approach should work.
%let max_rows_per_table=5;
data have;
set sashelp.class;
run;
data _null_;
if _n_=1 then
do;
dcl hash h1(dataset:'have(obs=0)',multidata:'y');
h1.defineKey('sex');
h1.defineData(all:'y');
h1.defineDone();
end;
set have end=last;
h1.add();
if mod(_n_,&max_rows_per_table)=0 or last then
do;
h1.output(dataset:cats('work.want_',put(_n_,z10.)));
h1.clear();
end;
run;
proc sql;
select *
from dictionary.tables
where libname='WORK' and memname like 'WANT%'
;
quit;
The logic for naming the table slices could of course be different - like 1, 2, 3...
In the hash definition for h1.defineKey('<some variable>'); you can select any column that exists in your source table - but ideally choose the one with the lowest length as this will save some memory.
Using hash tables is the only way I know of that allows you to create output tables dynamically during execution time. Any other approach will require some pre-processing to define the output tables (not that hard to do but will require some additional coding to generate the required syntax dynamically).
No need to create data sets, just select the records desired from the source data set.
Pseudo code example to export records to different files in groups of 100:
proc export data=have (firstobs=1 obs=100) outfile=<your first file> <other options> ; run; proc export data=have (firstobs=101 obs=200) outfile=<your 2nd file> <other options> ; run;
So you can use something like this to build the observation selection bit and a filename to plug into CALL EXECUTE code with boiler plate for the proc export options. Or create strings and write to a text program file to have documentation of what was created and run.
data _null_; do countobs = 1 to 3025 by 300; groupcount+1; string= catx(' ',"(firstobs=",countobs," obs=",groupcount*300,')'); put string=; filename = cats("<path>\filebasename",put(groupcount,z3.),'.extension'); put filename=; end; run;
data one two three;
set SASHELP.baseball;
which= mod(_n_,3);
if which= 0 then output one;
else do;
if which < 2 then output two;
else output three;
end;
drop which;
run;
A solution using CALL EXECUTE for the data set SASHELP.CARS
proc sql; /* Find number of observations in data set */
select nobs into :nobs from sashelp.vtable where libname='SASHELP' and memname="CARS";
quit;
%let n_splits=5; /* User specifies number of splits desired */
data _null_;
nobs_per_split=&nobs/&n_splits;
do i=1 to &n_splits;
start=round((i-1)*nobs_per_split+1);
end=round(i*nobs_per_split);
call execute(cat('proc export data=sashelp.cars(firstobs=',start,' obs=',end,') outfile="table',i,'.csv" dbms=csv; run;'));
end;
run;
No need to split into datasets. Just use the FILEVAR= option of the FILE statement in the DATA step that does the export.
In which file format do you export your data?
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.