Once you get to REAL data sets (50 million rows is in this range), you need to take care of your storage infrastructure. a) make it FAST, using high-rpm disks or SSDs for the work area. If you are concerned about failsafes, use RAID1 (simple mirrors). If being failsafe is not a big thing, use striping b) separate your UTILLOC physically from the work/data location, and make sure these disks do nothing else. UTILLOC is where the intermediate file is stored during PROC SORT Then look at this: a) use a combination of proc sort and data steps to do the merge. PROC SQL is a resource hog of the nth order when it comes to large joins. Real life experience here has shown that SQL gets progressively slower when several processes are running, much more than the sort/merge steps. Up to a point where the server becomes unresponsive, which is very rare with an AIX system(!). b) indexes usually don't help (much), because in addition to the data, SAS needs to read the index, causing even more I/O. Indexes are very good if you need to access a small subset of data. c) identify which sort criteria will be needed most, and have your data sets already sorted correctly when you store them. That way users (including yourself) do not need to sort and can read the big datasets sequentially. d) when you do a data step merge, you need space for (just) the source files and the target files. With SQL, you also need space for the utiilty file, which will grow to a size equal of all the source files together. During the sorts preceding the merge, you only need extra space for the file being sorted, the temp file will be in UTILLOC
... View more