07-16-2015 10:01 AM
Hello SAS experts,
Could someone please explain me how to determine the best combination of Bufsize/Bufno values to optimize the performances of my SAS program ?
It acts as an ETL for data retrieved from views in an Oracle database (which works fine by using the Readbuff option in the libname). SAS 9.2 runs on a Windows Server on which there are 16 CPUs and 16 Gb of RAM available (but only 4Gb are assigned to the memsize). The options THREADS, CPUCOUNT=ACTUAL and COMPRESS=CHAR are set for the session.
The biggest data set that is extracted and on which transformations are done is about 20 GB big (and keeps on growing everyday since new data gets inserted into the Oracle database). The data set page size is of 8192 and the number of data set pages is 2149668. I would like to set optimal values of Bufsize/Bufno for the whole session, not per data set.
Thanks a lot in advance for your advises.
07-17-2015 03:20 PM
It is about understanding the hardware and processes. You have tow of them. Getting Oracle to SAS en working in SAS datasets.
All data is transferred in blocks with sizes of 512 byte 4K 64K and more. The less conversion is needed the better the performance will be.
The more data you can keep in memory and minimize needed IO the better the performance will be.
There must be some transfersize. You can check these and optimize that one and having minimized locking and conversions.
You are seeming not to have issues with this one. save the possible questions for later.
As it is a DBMS the data is stored in pages of some size. this tuning is DBA work. Funny the same pagesize approach is done with SAS.
- Setting the pagesize low (8K) is adding overhead you could use one of 64k SAS(R) 9.2 Language Reference: Dictionary, Fourth Edition
Theis setting will be the page size (proc contents) of new datasets.
- The bufno options is often not used. you could set it at 8, 16 or higer. it is the caching of the data (datasets) done by SAS.
Normally the OS is dong the caching efficient enough but with big datasets and little memory (16G and 20G datasets) you can spoil the caching of the OS that is becoming negative for the performance. There are notes on that with sgio parameter https://support.sas.com/resources/papers/IOthruSGIO.pdf
- Compresion=binary is RDC SAS(R) 9.2 Language Reference: Dictionary, Fourth Edition. It a zip like approach on the record (not the file) It will work well with datasets with many variables (long physical record)
- A pitty you are not on 9.3 or up 46954 - SAS® system option settings for best performance in UNIX and Windows environments The Alignsasiofiles is effective for that alignment to the OS. minimizing at that interface the conversions. The Ibuf and ubufsize are new settings with 9.4 as those options getting effective for the other internal used data
- Them memsize is te setting for internal measure sas usage. Some statistical procedures will need more. You could optimize sort by using as much as possible (memsize 4G sortsize 3Gb) for that.