BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.

I'm migrating to a new linux server, and I just noticed that a one-record dataset with 3 variables takes up 1.5MB.  On the prior server it took 128K.

 

If I run:

data a ;
  length x 8 y $8 z $100 ;
  x=1 ;
  y='1' ;
  z='1' ;
run ;

proc contents data=a ;
run ;

I get:

Data Set Page Size 524288 
Number of Data Set Pages 2 
Number of Data Set Repairs 0 
Filename /saswork/.../a.sas7bdat 
Release Created 9.0401M6 
Host Created Linux 
File Size 2MB 
File Size (bytes) 1572864 

I noticed the Data Set Page Size is much bigger that the prior server.  I think it was 65,536 on the old server.

 

Both prior server and new server are Linux.

 

Is there a SAS option that determines page size, or is it an OS thing?  I have a good number of small datasets with control data.  I hate to think that they could each take a 1MB to store them.  

 

Any other reason a small data set would suddenly take up a lot of disk space?

The Boston Area SAS Users Group (BASUG) is hosting our in person SAS Blowout on Oct 18!
This full-day event in Cambridge, Mass features four presenters from SAS, presenting on a range of SAS 9 programming topics. Pre-registration by Oct 15 is required.
Full details and registration info at https://www.basug.org/events.
1 ACCEPTED SOLUTION

Accepted Solutions
FreelanceReinh
Jade | Level 19

Hi @Quentin,

 

On my Windows workstation (with SAS 9.4M5) the BUFSIZE= system option (which can be overridden by the BUFSIZE= data set option) determines data set page size. With the default value 0 (i.e. "minimum optimal buffer size for the operating environment") I get the same result as with BUFSIZE=64k for your test dataset:

Data Set Page Size          65536
Number of Data Set Pages    1
First Data Page             1
Max Obs per Page            545
Obs in First Data Page      1
File Size                   128KB
File Size (bytes)           131072

With BUFSIZE=512k I obtain:

Data Set Page Size          524288
Number of Data Set Pages    1
First Data Page             1
Max Obs per Page            4364
Obs in First Data Page      1
File Size                   1MB
File Size (bytes)           1048576

I don't know why it's 2 pages on your system.

 

Data set page size is mostly very close to the BUFSIZE value, e.g. 123904 for BUFSIZE=123456 or 1234944 for BUFSIZE=1234567 (integer multiples of 512, I guess). Smaller values than the "minimum optimal buffer size" are allowed: For example, 32k yields a file size of only 64KB.

View solution in original post

3 REPLIES 3
FreelanceReinh
Jade | Level 19

Hi @Quentin,

 

On my Windows workstation (with SAS 9.4M5) the BUFSIZE= system option (which can be overridden by the BUFSIZE= data set option) determines data set page size. With the default value 0 (i.e. "minimum optimal buffer size for the operating environment") I get the same result as with BUFSIZE=64k for your test dataset:

Data Set Page Size          65536
Number of Data Set Pages    1
First Data Page             1
Max Obs per Page            545
Obs in First Data Page      1
File Size                   128KB
File Size (bytes)           131072

With BUFSIZE=512k I obtain:

Data Set Page Size          524288
Number of Data Set Pages    1
First Data Page             1
Max Obs per Page            4364
Obs in First Data Page      1
File Size                   1MB
File Size (bytes)           1048576

I don't know why it's 2 pages on your system.

 

Data set page size is mostly very close to the BUFSIZE value, e.g. 123904 for BUFSIZE=123456 or 1234944 for BUFSIZE=1234567 (integer multiples of 512, I guess). Smaller values than the "minimum optimal buffer size" are allowed: For example, 32k yields a file size of only 64KB.

Quentin
Super User

Thanks @FreelanceReinh .

 

I confirmed, on the prior server they had the default BUFSIZE=0 (which chooses minimum recommended for the OS, apparently 65,536), but on the new server they have set bufsize=524288. 

 

I guess they're trying to decrease I/O for processing big data sets.  The docs say:

The page size is the amount of data that can be transferred from a single input/output operation to one buffer. The page size is a permanent attribute of the data set and is used when the data set is processed.

 

A larger page size can improve execution time by reducing the number of times SAS has to read from or write to the storage medium. However, the improvement in execution time comes at the expense of increased memory consumption.

I'll double check with the admins to make sure they're happy with the trade-off, but it's their server.  

 

Was just a surprise yesterday as I was comparing data on the source server to the target server, and realized all these small data sets were taking much more space.

 

I confirmed on the target server, if I explicitly set bufsize to a 65,536, I'll get a 192K file again:

 

data a (bufsize=65536);
  length x 8 y $8 z $100 ;
  x=1 ;
  y='1' ;
  z='1' ;
run ;

proc contents data=a ;
run ;
The Boston Area SAS Users Group (BASUG) is hosting our in person SAS Blowout on Oct 18!
This full-day event in Cambridge, Mass features four presenters from SAS, presenting on a range of SAS 9 programming topics. Pre-registration by Oct 15 is required.
Full details and registration info at https://www.basug.org/events.

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 2205 views
  • 3 likes
  • 3 in conversation