BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
SasStatistics
Pyrite | Level 9

In reality I have a SAS table having very many rows, think milions, and I want to split it into sub-tables. The reason for this is that I will export the data and those who will process it use a system that cannot process so many rows at once, so it has to be split. 

For clarity I will provide an demonstrative example that would solve my problems. 

 

Assume I have a SAS table consisting of 334 rows. 

Now I want to split this table into 3 tables, since 334/3 is 111.3333 the function would return: 
- table1 consisting of 111 rows. 

- table2 consisting of 111 rows. 

- table3 consisting of 112 rows. 


So I just say that I want 3 tables and then it puts the suitable amount of observations in each new table (111, 111 and 112 rows in this case). 

 

Any ideas on how to achieve this? 

1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User


proc surveyselect data=sashelp.heart groups=3 out=temp;
run;

data one two three;
 set temp;
 if GroupID=1 then output one;
 if GroupID=2 then output two;
 if GroupID=3 then output three;
run;

View solution in original post

10 REPLIES 10
tarheel13
Rhodochrosite | Level 12

just make a counter variable and use that. you could do if 1 <= count <= 111 then output table1. else if count 111 < count <= 223 then output table2; else if count > 233 then output table3;

Ksharp
Super User


proc surveyselect data=sashelp.heart groups=3 out=temp;
run;

data one two three;
 set temp;
 if GroupID=1 then output one;
 if GroupID=2 then output two;
 if GroupID=3 then output three;
run;
Holmes
Obsidian | Level 7

@Ksharp  escreveu:
proc surveyselect data=sashelp.heart groups=3 out=temp;
run;

data one two three;
 set temp;
 if GroupID=1 then output one;
 if GroupID=2 then output two;
 if GroupID=3 then output three;
run;

Can I use this to make a PROC SQL split a query?

I have a long-lasting query and want to split the results in one million rows per time. 

How can I do this?

Ksharp
Super User
Please start a brand new session, this question is almost three years ago.
" want to split the results in one million rows per time. “
You need make id variable.Like:
data have;
set have;
id+1;
run;

proc sql;
create table part1 as
select * from have where id in (1:1000000);
............
quit;
LinusH
Tourmaline | Level 20

First check the no of records in the table. This can be achieved by PROC DATASETS/CONTENTS, data file functions and PROC SQL DICTIONARAY.TABLES.

Calculate the intervals, and put them into macro variables.

Use a data step, and explicitly output to your target tables depending on the value of the automatic _n_ variable compared with the macro variables above.

Data never sleeps
Patrick
Opal | Level 21

IF you can fit a single slice of your table (&max_rows_per_table) into memory then below coding approach should work.

%let max_rows_per_table=5;

data have;
  set sashelp.class;
run;

data _null_;
  if _n_=1 then
    do;
      dcl hash h1(dataset:'have(obs=0)',multidata:'y');
      h1.defineKey('sex');
      h1.defineData(all:'y');
      h1.defineDone();
    end;
  set have end=last;
  h1.add();
  if mod(_n_,&max_rows_per_table)=0 or last then
    do;
      h1.output(dataset:cats('work.want_',put(_n_,z10.)));
      h1.clear();
    end;
run;

proc sql;
  select *
  from dictionary.tables
  where libname='WORK' and memname like 'WANT%'
  ;
quit;

Patrick_0-1650548767889.png

The logic for naming the table slices could of course be different - like 1, 2, 3...

In the hash definition for h1.defineKey('<some variable>'); you can select any column that exists in your source table - but ideally choose the one with the lowest length as this will save some memory. 

 

Using hash tables is the only way I know of that allows you to create output tables dynamically during execution time. Any other approach will require some pre-processing to define the output tables (not that hard to do but will require some additional coding to generate the required syntax dynamically).

ballardw
Super User

No need to create data sets, just select the records desired from the source data set.

Pseudo code example to export records to different files in groups of 100:

proc export data=have (firstobs=1 obs=100)
    outfile=<your first file>
    <other options>
;
run;

proc export data=have (firstobs=101 obs=200)
    outfile=<your 2nd file>
    <other options>
;
run;

So you can use something like this to build the observation selection bit and a filename to plug into CALL EXECUTE code with boiler plate for the proc export options. Or create strings and write to a text program file to have documentation of what was created and run.

data _null_;
   do countobs = 1 to 3025 by 300;
      groupcount+1;
      string= catx(' ',"(firstobs=",countobs," obs=",groupcount*300,')');
      put string=;
      filename = cats("<path>\filebasename",put(groupcount,z3.),'.extension');
      put filename=;
   end;
run;
Michael_Harper
Obsidian | Level 7

data one two three;
set SASHELP.baseball;

which= mod(_n_,3);
if which= 0 then output one;
else do;
if which < 2 then output two;
else output three;
end;
drop which;
run;

Michael Joe Harper
PaigeMiller
Diamond | Level 26

A solution using CALL EXECUTE for the data set SASHELP.CARS 

 

proc sql; /* Find number of observations in data set */
    select nobs into :nobs from sashelp.vtable where libname='SASHELP' and memname="CARS";
quit;
%let n_splits=5; /* User specifies number of splits desired */

data _null_;
    nobs_per_split=&nobs/&n_splits;
    do i=1 to &n_splits;
        start=round((i-1)*nobs_per_split+1);
        end=round(i*nobs_per_split);
        call execute(cat('proc export data=sashelp.cars(firstobs=',start,' obs=',end,') outfile="table',i,'.csv" dbms=csv; run;'));
    end;
run;
--
Paige Miller

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 10 replies
  • 5446 views
  • 9 likes
  • 10 in conversation