BookmarkSubscribeRSS Feed
thanikondharish
Fluorite | Level 6

data ex1;

set table1 table2 table3 table4 ..........tablen;

run;

 

data ex2;

merrge table1 table2 table3 table4 ..........tablen;

run;

In above two steps how many data sets we can write like table1, table2....upto ?

I think 256 datasets we can use. Kindly let me know

 

5 REPLIES 5
sbxkoenk
SAS Super FREQ

Hello,

 

As far as I know by heart, the 256-table limit is for PROC SQL.

I do not think the set and merge statement in the data-step have an explicit limit w.r.t. number of tables and if there's a limit it's for sure bigger than 256.

But this is very easy to test of course!!

 

Cheers,

Koen

 

Patrick
Opal | Level 21

Using below sample code for testing I've encountered an error starting with ds #7513 (not caused by running out of disk space).

I consider your question to be rather academic though and I'd advice to re-visit any design where you have to deal with a double digit of tables at once.

data _null_;
  if 0 then set sashelp.class(keep=name);
  dcl hash h1(dataset:'sashelp.class(where=(name="Alfred"))');
  h1.defineKey('name');
  h1.defineData('name');
  h1.defineDone();
  do i=1 to 2000;
    h1.output(dataset:cats('tbl_',put(i,z6.)));
  end;
  stop;
run;

data test;
  set tbl_:;
run;

proc datasets lib=work nolist nodetails;
  delete tbl_:;
  run;
quit;

proc contents data=test;
run;quit;

Patrick_0-1617458368042.png

 

mkeintz
PROC Star

@Patrick wrote:

....

I'd advice to re-visit any design where you have to deal with a double digit of tables at once.


I agree this is a preferable state of affairs.  However, we have had a very significant collection of data that we have very good justification to deal with 20 to 30 tables - sometimes 200.

 

In particular we have two SAS datasets for each day of trading on the major American stocks exchanges.  One dataset (the actual trades) will have hundreds of millions of records (with timestamps to the nanosecond) per day - every day has exactly the same 15 or so variables.  The other ("quotes" - i.e. offers to buy or sell) will have an order of magnitude more records, but is almost as skinny - about 20 variables.

 

The trade dataset name is CTyyyymmdd and the quotes is named CQyyyymmdd.

 

Frequently a user needs to process a month of data - i.e. they will use a SET CT201206: to read in trades for the month of June in 2012.  The issue becomes a bit problematic when the user gets piggy and try to get data for a whole year - about 200 datasets (200 trading days).

 

But the problem is usually related to memory, which is often solved by using the "open=defer" option of SET - which tells SAS to re-use the same input buffer for each incoming data set, rather than the default of one buffer per dataset.  Of course this benefits from knowing that each dataset has the same collection of variables.   "open=defer" will not honor introduction of new variables after the first dataset in the list.

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Tom
Super User Tom
Super User

You must have a small machine.  I was able to use your program to build/read 20,000 tiny datasets.

 

Note you can reduce the amount of memory used by adding the OPEN=DEFER option to the SET statement.

options fullstimer ;
data test;
  set tbl_000001-tbl_001000;
run;

data test;
  length name $8;
  set tbl_000001-tbl_001000 open=defer;
run;

Here are the two FULLSTIMER outputs:

NOTE: DATA statement used (Total process time):
      real time           0.25 seconds
      user cpu time       0.26 seconds
      system cpu time     0.00 seconds
      memory              192604.68k
      OS Memory           446336.00k
      Timestamp           04/03/2021 11:12:47 AM
      Step Count                        18  Switch Count  0
      Page Faults                       1
      Page Reclaims                     34273
      Page Swaps                        0
      Voluntary Context Switches        5
      Involuntary Context Switches      4
      Block Input Operations            352
      Block Output Operations           392


NOTE: DATA statement used (Total process time):
      real time           0.13 seconds
      user cpu time       0.13 seconds
      system cpu time     0.00 seconds
      memory              2285.09k
      OS Memory           145772.00k
      Timestamp           04/03/2021 11:12:48 AM
      Step Count                        19  Switch Count  0
      Page Faults                       0
      Page Reclaims                     45
      Page Swaps                        0
      Voluntary Context Switches        1
      Involuntary Context Switches      0
      Block Input Operations            0
      Block Output Operations           264
Patrick
Opal | Level 21

@mkeintz 

I was more thinking about merge/join when I made my statement. You're of course right that stacking of transactional/daily data is not that uncommon and can easily involve >10 tables.

I've never fully appreciated the impact on memory option open=defer has. Very valuable to keep in mind. Thanks for that!

 

@Tom 

The limiting factor for me was the default setting of MEMMAXSZ (2GB). Once I've increased this value I could process more tables.

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 3018 views
  • 6 likes
  • 5 in conversation