I'm migrating to a new linux server, and I just noticed that a one-record dataset with 3 variables takes up 1.5MB. On the prior server it took 128K.
If I run:
data a ;
length x 8 y $8 z $100 ;
x=1 ;
y='1' ;
z='1' ;
run ;
proc contents data=a ;
run ;
I get:
Data Set Page Size 524288 Number of Data Set Pages 2 Number of Data Set Repairs 0 Filename /saswork/.../a.sas7bdat Release Created 9.0401M6 Host Created Linux File Size 2MB File Size (bytes) 1572864
I noticed the Data Set Page Size is much bigger that the prior server. I think it was 65,536 on the old server.
Both prior server and new server are Linux.
Is there a SAS option that determines page size, or is it an OS thing? I have a good number of small datasets with control data. I hate to think that they could each take a 1MB to store them.
Any other reason a small data set would suddenly take up a lot of disk space?
Hi @Quentin,
On my Windows workstation (with SAS 9.4M5) the BUFSIZE= system option (which can be overridden by the BUFSIZE= data set option) determines data set page size. With the default value 0 (i.e. "minimum optimal buffer size for the operating environment") I get the same result as with BUFSIZE=64k for your test dataset:
Data Set Page Size 65536 Number of Data Set Pages 1 First Data Page 1 Max Obs per Page 545 Obs in First Data Page 1 File Size 128KB File Size (bytes) 131072
With BUFSIZE=512k I obtain:
Data Set Page Size 524288 Number of Data Set Pages 1 First Data Page 1 Max Obs per Page 4364 Obs in First Data Page 1 File Size 1MB File Size (bytes) 1048576
I don't know why it's 2 pages on your system.
Data set page size is mostly very close to the BUFSIZE value, e.g. 123904 for BUFSIZE=123456 or 1234944 for BUFSIZE=1234567 (integer multiples of 512, I guess). Smaller values than the "minimum optimal buffer size" are allowed: For example, 32k yields a file size of only 64KB.
Hi @Quentin,
On my Windows workstation (with SAS 9.4M5) the BUFSIZE= system option (which can be overridden by the BUFSIZE= data set option) determines data set page size. With the default value 0 (i.e. "minimum optimal buffer size for the operating environment") I get the same result as with BUFSIZE=64k for your test dataset:
Data Set Page Size 65536 Number of Data Set Pages 1 First Data Page 1 Max Obs per Page 545 Obs in First Data Page 1 File Size 128KB File Size (bytes) 131072
With BUFSIZE=512k I obtain:
Data Set Page Size 524288 Number of Data Set Pages 1 First Data Page 1 Max Obs per Page 4364 Obs in First Data Page 1 File Size 1MB File Size (bytes) 1048576
I don't know why it's 2 pages on your system.
Data set page size is mostly very close to the BUFSIZE value, e.g. 123904 for BUFSIZE=123456 or 1234944 for BUFSIZE=1234567 (integer multiples of 512, I guess). Smaller values than the "minimum optimal buffer size" are allowed: For example, 32k yields a file size of only 64KB.
Thanks @FreelanceReinh .
I confirmed, on the prior server they had the default BUFSIZE=0 (which chooses minimum recommended for the OS, apparently 65,536), but on the new server they have set bufsize=524288.
I guess they're trying to decrease I/O for processing big data sets. The docs say:
The page size is the amount of data that can be transferred from a single input/output operation to one buffer. The page size is a permanent attribute of the data set and is used when the data set is processed.
A larger page size can improve execution time by reducing the number of times SAS has to read from or write to the storage medium. However, the improvement in execution time comes at the expense of increased memory consumption.
I'll double check with the admins to make sure they're happy with the trade-off, but it's their server.
Was just a surprise yesterday as I was comparing data on the source server to the target server, and realized all these small data sets were taking much more space.
I confirmed on the target server, if I explicitly set bufsize to a 65,536, I'll get a 192K file again:
data a (bufsize=65536);
length x 8 y $8 z $100 ;
x=1 ;
y='1' ;
z='1' ;
run ;
proc contents data=a ;
run ;
Look at the blocksizes of the server filesystems.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.