A few thoughts... I'm sure you've already tried some of these.
1. Make sure to turn on the maximum performance statistics available with OPTIONS FULLSTIMER; You could set that in a program node. Also, make sure the project log is turned on so you can review all the log info at once. In EG 4.1 that's under File / Properties / Project Log. Consulting the project log with FULLSTIMER set will give you an idea of what steps are taking the most resources in terms of I/O, memory, CPU time, and real time.
2. Indexes can improve the performance of subsetting and joins. Since you're rebuilding the data every day, it might not be worth the overhead to recreate indexes, but if you do a lot of subsequent subsetting etc. it might be worth a try.
3. Joins may give you better performance that data step merges. Of course, you may be doing joins already.
4. Pre-sorting intelligently can sometimes reduce or eliminate insufficient memory messages. I'm thinking about procedures like PROC MEANS with a CLASS statement or PROC REPORT. Of course, sorting takes a lot of resources.
5. If you're firing off many separate types of analysis that are grouped by the same set of columns, pre-sorting based on that set of columns can reduce EG overhead.
6. Assuming you can get the insufficient memory problems under control, what's wrong with just scheduling your monster project to run at 4 AM each day? Ultimately, all the coding enhancements in the world will only get you so far since you're processing such a large volume of data. After that, you'd really have to upgrade your hardware and/or find a better storage medium for this much data (e.g. SPD file, Teradata, etc.).
Again, many of these suggestions you may have tried and/or not apply to your project. Maybe one of these will help! What steps, specifically, are generating the insufficient memory messages? PROC SQL? PROC REPORT?
... View more