BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.

I'm migrating to a new linux server, and I just noticed that a one-record dataset with 3 variables takes up 1.5MB.  On the prior server it took 128K.

 

If I run:

data a ;
  length x 8 y $8 z $100 ;
  x=1 ;
  y='1' ;
  z='1' ;
run ;

proc contents data=a ;
run ;

I get:

Data Set Page Size 524288 
Number of Data Set Pages 2 
Number of Data Set Repairs 0 
Filename /saswork/.../a.sas7bdat 
Release Created 9.0401M6 
Host Created Linux 
File Size 2MB 
File Size (bytes) 1572864 

I noticed the Data Set Page Size is much bigger that the prior server.  I think it was 65,536 on the old server.

 

Both prior server and new server are Linux.

 

Is there a SAS option that determines page size, or is it an OS thing?  I have a good number of small datasets with control data.  I hate to think that they could each take a 1MB to store them.  

 

Any other reason a small data set would suddenly take up a lot of disk space?

BASUG is hosting free webinars Next up: Mike Raithel presenting on validating data files on Wednesday July 17. Register now at the Boston Area SAS Users Group event page: https://www.basug.org/events.
1 ACCEPTED SOLUTION

Accepted Solutions
FreelanceReinh
Jade | Level 19

Hi @Quentin,

 

On my Windows workstation (with SAS 9.4M5) the BUFSIZE= system option (which can be overridden by the BUFSIZE= data set option) determines data set page size. With the default value 0 (i.e. "minimum optimal buffer size for the operating environment") I get the same result as with BUFSIZE=64k for your test dataset:

Data Set Page Size          65536
Number of Data Set Pages    1
First Data Page             1
Max Obs per Page            545
Obs in First Data Page      1
File Size                   128KB
File Size (bytes)           131072

With BUFSIZE=512k I obtain:

Data Set Page Size          524288
Number of Data Set Pages    1
First Data Page             1
Max Obs per Page            4364
Obs in First Data Page      1
File Size                   1MB
File Size (bytes)           1048576

I don't know why it's 2 pages on your system.

 

Data set page size is mostly very close to the BUFSIZE value, e.g. 123904 for BUFSIZE=123456 or 1234944 for BUFSIZE=1234567 (integer multiples of 512, I guess). Smaller values than the "minimum optimal buffer size" are allowed: For example, 32k yields a file size of only 64KB.

View solution in original post

3 REPLIES 3
FreelanceReinh
Jade | Level 19

Hi @Quentin,

 

On my Windows workstation (with SAS 9.4M5) the BUFSIZE= system option (which can be overridden by the BUFSIZE= data set option) determines data set page size. With the default value 0 (i.e. "minimum optimal buffer size for the operating environment") I get the same result as with BUFSIZE=64k for your test dataset:

Data Set Page Size          65536
Number of Data Set Pages    1
First Data Page             1
Max Obs per Page            545
Obs in First Data Page      1
File Size                   128KB
File Size (bytes)           131072

With BUFSIZE=512k I obtain:

Data Set Page Size          524288
Number of Data Set Pages    1
First Data Page             1
Max Obs per Page            4364
Obs in First Data Page      1
File Size                   1MB
File Size (bytes)           1048576

I don't know why it's 2 pages on your system.

 

Data set page size is mostly very close to the BUFSIZE value, e.g. 123904 for BUFSIZE=123456 or 1234944 for BUFSIZE=1234567 (integer multiples of 512, I guess). Smaller values than the "minimum optimal buffer size" are allowed: For example, 32k yields a file size of only 64KB.

Quentin
Super User

Thanks @FreelanceReinh .

 

I confirmed, on the prior server they had the default BUFSIZE=0 (which chooses minimum recommended for the OS, apparently 65,536), but on the new server they have set bufsize=524288. 

 

I guess they're trying to decrease I/O for processing big data sets.  The docs say:

The page size is the amount of data that can be transferred from a single input/output operation to one buffer. The page size is a permanent attribute of the data set and is used when the data set is processed.

 

A larger page size can improve execution time by reducing the number of times SAS has to read from or write to the storage medium. However, the improvement in execution time comes at the expense of increased memory consumption.

I'll double check with the admins to make sure they're happy with the trade-off, but it's their server.  

 

Was just a surprise yesterday as I was comparing data on the source server to the target server, and realized all these small data sets were taking much more space.

 

I confirmed on the target server, if I explicitly set bufsize to a 65,536, I'll get a 192K file again:

 

data a (bufsize=65536);
  length x 8 y $8 z $100 ;
  x=1 ;
  y='1' ;
  z='1' ;
run ;

proc contents data=a ;
run ;
BASUG is hosting free webinars Next up: Mike Raithel presenting on validating data files on Wednesday July 17. Register now at the Boston Area SAS Users Group event page: https://www.basug.org/events.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 2032 views
  • 3 likes
  • 3 in conversation