03-15-2017 02:52 AM
SAS 9.3 on Windows
I'm asking this on behalf of a Technical Architect colleague. He is documenting the details for a new server, and asked if SAS had a BLKSIZE option. His concern is SAS memory consumption on the server. (That last sentence may be a red herring - comment on that if you want, but my main questions revolve around BUFSIZE vs BLKSIZE)
I found these links in the doc:
However, for the LIBNAME statement itself, the only reference I could find for BLKSIZE was in the Z/OS Companion, as well as the first link above. I couldn't find any reference to BLKSIZE in the Base documentation or Windows Companion.
* does blksize make any difference? ; %macro code; libname foo "C:\Temp" blksize=&word; data foo.data4_&word (label="Libname option: blksize=&word"); %dataset run; %mend; %loop(1024 10240 102400 1024000 10240000 102400000 1024000000 /*10240000000*/)
(Uncomment that last BLKSIZE value, if you dare...)
proc sql; select * from dictionary.tables where libname='FOO'; quit; proc contents data=foo._all_ details; run;
The BLKSIZE option for the LIBNAME statement appeared to have no effect.
1) Are BUFSIZE and BLKSIZE functionally equivalent? Their descriptions seem similar to me:
Input/output block size (BLKSIZE=) For Windows, UNIX, and z/OS environments, you can specify the number of bytes that are physically read during an I/O operation. The default is 8 kilobytes, and the maximum value is 1 megabyte.
The BUFSIZE system option enables you to specify the permanent buffer page size for output SAS data sets. Under Windows, the value can range from 512 bytes to 2,147,483,647 bytes. Using the default value of 0 optimizes the buffer page size by enabling the engine to pick a value depending on the size of the observation. Experienced users might want to vary the value of the BUFSIZE system option if you are trying to maximize memory usage or the number of observations per page.
2) Is the BLKSIZE option meant to have any effect on Windows? For example, on my system the physical dataset size was identical for all values of BLKSIZE.
3) And if so, what utility (DATASETS? CONTENTS?) do I use to see the datasets's BLKSIZE?
03-15-2017 03:27 AM
First of all, SAS memory consumption is controlled by the MEMSIZE option, which has to be set at startup (config file, commandline or environment variables). A single SAS process won't go beyond that, period. So your architect has to calculate from the number of concurrent SAS processes and the individual requirements of those processes how to set MEMSIZE and how much RAM will be needed.
I recommend to get in touch with your SAS representative for in-depth support in this matter. SAS also provides papers that help in setting up an optimized SAS infrastructure, for all supported platforms. ie I've got a lot of valuable insight from those papers concerning the AIX platform.
blksize controls allocation of buffers for SAS datasets in a library.
Both these parameters depend on the blocksize of your filesystem(s) and the page sizes of datasets. They are so much below the typical requirements of a SAS process (think 4 K BUFSIZE vs. 256 M MEMSIZE) that the effect on overall memory consumption is negligible, IMO. Especially in the light of the MEMSIZE limit. The effect of BUFSIZE and BLKSIZE comes in terms of I/O operations and therefore I/O performance.
The biggest memory consumers on a all-in-one SAS server are the Java processes of the Web Application Server and the Web Infrastructure.
Setting MEMSIZE is largely determined by the type of SAS usage:
- mainly web apps, with few pooled servers -> large MEMSIZE
- mainly individual workspace server, lots of users, lots of concurrent processes -> smaller MEMSIZE
our current MEMSIZE for the workspace server (SAS 9.2) is at 192 M, as I have to accomodate up to 40 concurrent WS processes with 32G of RAM (and I want as much file cache from the system as I can get)
03-16-2017 02:02 AM
Thanks Kurt for your detailed reply.
I'm aware of the MEMSIZE option, but was just passing on the architect's initial comments. TBH, I just heard the comments in passing, he was wondering if SAS could "set the block size", and sounded like he might put negative comments about SAS in his document. He is more experienced with Oracle than SAS, which likely has different memory management issues. I also know they are also using a SAS partner more experienced in architecture than me.
Anyway, that triggered me to test a bit and, as I searched the doc and ran my test code, I found the information and issues I raised in my OP.
From my research and test code, I think I've got a good grasp of the BUFSIZE (and BUFNO) options.
From your reply, I'm still unsure what BLKSIZE does.
Can you expand on "...controls allocation of buffers for SAS datasets in a library". Is it a property of the dataset? Or a transient/runtime property of the library?
Lastly, I couldn't find any good documentation on it, except for the Z/OS Companion: http://support.sas.com/documentation/cdl/en/hosto390/65144/HTML/default/viewer.htm#p1icm2431sjh6bn1d....
As far as I can tell, it has no relevance in Windows, although it doesn't generate any error.
03-16-2017 03:56 AM
Yes, I also think that blksize is mostly, if not completely irrelevant on systems where files are always considered to be an amorphous stream of bytes (as far as a user sees them).
With z/OS, a lot of the internal structure of files is handled by the system itself. The only files that are basically a stream of bytes as in other systems are the members of so-called partitioned datasets, and partititioned datasets are what SAS uses for libraries on z/OS. The individual datasets are then stored as members of the PDS.
Since the PDS is declared on system level with a certain blocksize, and that has an effect on how the system reads and writes, setting the blksize for a library can have great effects on the I/O performance of that library. Another point of consideration for the blksize is the size of hard disk tracks(!).
So the blksize is mainly used to synchronize SAS I/O with the system I/O, so you don't force the system to write two blocks (one of them only used by half) for one of yours, and then rewrite the "half" block on the next access.
All this is irrelevant on modern systems that completely hide physical structures from end-user programs, and where even a multitude of physically different storage devices can be used in one logical volume, so parts of your file could reside on physically different devices.
DBMS systems don't use files, but raw disk volumes, and then it can become very important to handle the hardware efficiently. This might be the background of the architect's question.
The last answer I got from our SAN admins when I asked them how to structure my volume groups on the new server was "Don't bother. The SAN infrastructure will detect problems, eg hotspots, by itself and fix it by physically relocating your data. It does that on it's own".