Yeah, I know it's an Oracle thing but I wanted a SAS user perspective rather than just the Oracle. At the end of the day, I need to know how to use SAS in the Oracle environment and Oracle people can't help with that.
That being said, I went to one of our Oracle experts (I didn't know she was an Oracle expert or I would have started with her). She did give some things to optimize but we think we figured out what was going on. I think this is probably of interest to SAS coders and I'm curious if proc sql handles this the same way. It turns out that the DISTINCT statement executes after all other selecting is done. So in my case, after the primary query had returned the 2.1 million records the DISTINCT then executed to reduce the size down to 120,000. In the execution of the DISTINCT (in Oracle at least, I'm curious as to proc sql and other database types) there is an implicit sort that occurs. As we all know, sorts can be very memory intensive and this is what was causing the poor time performance. As a user I am only allocated a limited amount of memory when I connect to the database. The database had to hold the 2.1 million records, perform the sort, and store the resultant table. Because of the size of the dataset, the majority of my allocated memory was already occupied. In order to accomplish the DISTINCT, Oracle was performing all sorts of gymnastics to open up space for temporary use, use it, and then clear it again. The solutions are to be allocated more memory (not going to happen), use less memory (i.e. return fewer records) or not perform the DISTINCT. I went with option number three.
As an aside, I was also able to find a different index to select on and added some other code efficiencies that I was unaware of. The end result was twofold, I reduced the run time from 2 hours down to under 5 minutes and I made a new Oracle friend
... View more