data ex1;
set table1 table2 table3 table4 ..........tablen;
run;
data ex2;
merrge table1 table2 table3 table4 ..........tablen;
run;
In above two steps how many data sets we can write like table1, table2....upto ?
I think 256 datasets we can use. Kindly let me know
Hello,
As far as I know by heart, the 256-table limit is for PROC SQL.
I do not think the set and merge statement in the data-step have an explicit limit w.r.t. number of tables and if there's a limit it's for sure bigger than 256.
But this is very easy to test of course!!
Cheers,
Koen
Using below sample code for testing I've encountered an error starting with ds #7513 (not caused by running out of disk space).
I consider your question to be rather academic though and I'd advice to re-visit any design where you have to deal with a double digit of tables at once.
data _null_;
if 0 then set sashelp.class(keep=name);
dcl hash h1(dataset:'sashelp.class(where=(name="Alfred"))');
h1.defineKey('name');
h1.defineData('name');
h1.defineDone();
do i=1 to 2000;
h1.output(dataset:cats('tbl_',put(i,z6.)));
end;
stop;
run;
data test;
set tbl_:;
run;
proc datasets lib=work nolist nodetails;
delete tbl_:;
run;
quit;
proc contents data=test;
run;quit;
@Patrick wrote:
....
I'd advice to re-visit any design where you have to deal with a double digit of tables at once.
I agree this is a preferable state of affairs. However, we have had a very significant collection of data that we have very good justification to deal with 20 to 30 tables - sometimes 200.
In particular we have two SAS datasets for each day of trading on the major American stocks exchanges. One dataset (the actual trades) will have hundreds of millions of records (with timestamps to the nanosecond) per day - every day has exactly the same 15 or so variables. The other ("quotes" - i.e. offers to buy or sell) will have an order of magnitude more records, but is almost as skinny - about 20 variables.
The trade dataset name is CTyyyymmdd and the quotes is named CQyyyymmdd.
Frequently a user needs to process a month of data - i.e. they will use a SET CT201206: to read in trades for the month of June in 2012. The issue becomes a bit problematic when the user gets piggy and try to get data for a whole year - about 200 datasets (200 trading days).
But the problem is usually related to memory, which is often solved by using the "open=defer" option of SET - which tells SAS to re-use the same input buffer for each incoming data set, rather than the default of one buffer per dataset. Of course this benefits from knowing that each dataset has the same collection of variables. "open=defer" will not honor introduction of new variables after the first dataset in the list.
You must have a small machine. I was able to use your program to build/read 20,000 tiny datasets.
Note you can reduce the amount of memory used by adding the OPEN=DEFER option to the SET statement.
options fullstimer ;
data test;
set tbl_000001-tbl_001000;
run;
data test;
length name $8;
set tbl_000001-tbl_001000 open=defer;
run;
Here are the two FULLSTIMER outputs:
NOTE: DATA statement used (Total process time): real time 0.25 seconds user cpu time 0.26 seconds system cpu time 0.00 seconds memory 192604.68k OS Memory 446336.00k Timestamp 04/03/2021 11:12:47 AM Step Count 18 Switch Count 0 Page Faults 1 Page Reclaims 34273 Page Swaps 0 Voluntary Context Switches 5 Involuntary Context Switches 4 Block Input Operations 352 Block Output Operations 392 NOTE: DATA statement used (Total process time): real time 0.13 seconds user cpu time 0.13 seconds system cpu time 0.00 seconds memory 2285.09k OS Memory 145772.00k Timestamp 04/03/2021 11:12:48 AM Step Count 19 Switch Count 0 Page Faults 0 Page Reclaims 45 Page Swaps 0 Voluntary Context Switches 1 Involuntary Context Switches 0 Block Input Operations 0 Block Output Operations 264
I was more thinking about merge/join when I made my statement. You're of course right that stacking of transactional/daily data is not that uncommon and can easily involve >10 tables.
I've never fully appreciated the impact on memory option open=defer has. Very valuable to keep in mind. Thanks for that!
The limiting factor for me was the default setting of MEMMAXSZ (2GB). Once I've increased this value I could process more tables.
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.