The answer to this depends on the exact details.
In particular, there are probably two or three reasonable options here - depending on your system, the datasets, etc.
If the many datasets with 800+ variables are either:
a) Identical in metadata [variable names, types, lengths]
or
b) Have a combination of identical variable names/types/lengths and non-overlapping variable names
Then there is a simple solution, where you have one data step and run each dataset through a loop.
---
If your datasets are nonidentical and have overlapping variable names that are not identically typed, then you can't easily do that. You might be able to if you use some interesting renaming; I've done that, where I renamed every variable (through a macro) to DS_<var>, where DS was a prefix for that dataset or even the name of the dataset if sufficiently short. It's just ... messy, and error-prone.
You can use DOSUBL to generate data steps inside your main data step to do the matching work, and allow persistent hash tables, but that's quite slow and probably not faster than just reloading the data sets into hash tables.
---
Third, you could consider formats instead of hash tables. Formats are persistent and quite fast to lookup, and if you're only writing a couple of them per dataset, not slow to load either. They do sometimes have performance issues with high numbers of formats, but in a server environment you might be okay there.
---
Fourth, another probably-slow-but-maybe-worth-checking option is PROC DS2. That would let you have persistent hash tables, I believe, and process each of your datasets. DS2 tends to be slow, though, on simple data reads.
If you can give us a bit more information as to your problems' full scope, we can probably help you find the specific solution that's most appropriate.
... View more