We have a situation where multiple users use Enterprise Guide (v3 with hotfix 3 installed) to query
star schema's in SAS (v8.2 TS2M0 on AIX)
Some queries are resolved by sort-merge, others by index joins.
We are running into performance problems with the index joins.
SAS data set pages are usually requested from disk multiple times
because the index column value is spread over the data set pages
(unless it is sorted on the index column values, but that's not always
For instance, data set page 1 may contain dates 01jan2000 and
When all observations for january 2000 are queried, page 1 will be
fetched twice: once for 01jan2000 and once for 02jan2000. This works
fine if the system load is low and the data set pages remain in the
system file cache between the two reads. However, when the system is
busy, these pages are constanly swapped out of the file cache, and have
to be fetched from disk again.
This thrashing causes performance to fluctuate heavily.
When the system load is low, we get a response time of 14 seconds on
such a query.
When the system load is high, we get a response time of 14 minutes on
such a query.
Users find this variation very hard to work with.
We assigned a file cache on the AIX machine of about 8GB, but SAS data
set sizes are often larger than that. When any user submits a query
that performs a table scan on such a table, the complete file cache is
overwritten by the newly fetched data set pages, even though they are
only used once for this table scan. I would rather have not used any
cache for the table scan and have left the pages in the cache for use
by the index joins.
Has anybody else run into this, and found a solution?
I would like AIX to use file cache only for index joins and not for
I'm pretty sure that can't be done, but maybe there are alternatives.
I could imagine a smarter file caching algorithm that keeps pages in
cache depending on their usage.
Any suggestions are appreciated.