BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
ben12
Fluorite | Level 6

Hi everyone. So, there is a bit of disagreement in my work unit regarding the effectiveness of using OPTIONS COMPRESS=YES in order to speed up SAS code. I would like to get some expert opinions to try to resolve this discussion. 

 

Obviously, when using data which is stored on disk, compression is a good idea, because often the performance bottleneck is I/O related. So, reading and writing a compressed dataset is a good idea. There's no contention about that. 

 

But does compression speed up the performance of DATA/PROC steps when the dataset has already been loaded into memory? I don't see how compression could speed performance up in this case, and in fact it seems to me that a compressed in-memory dataset would actually slow down SAS procedures because of the extra processing step required to decompress. 

 

So, to clarify, consider the following two code blocks: 

 

Example 1: 

========

 

options compress = yes;

 

data work.d1;

  var1 = var2 * var3;

run; 

 

proc freq data = work.d1;

  table var1;

run;

 

Example 2:

========

 

options compress = no;

 

data work.d1;

  var1 = var2 * var3;

run; 

 

proc freq data = work.d1;

  table var1;

run;

Will example 1 run any faster than example 2 because of OPTIONS COMPRESS=YES? 

 

Thanks everyone for your time. 

1 ACCEPTED SOLUTION

Accepted Solutions
13 REPLIES 13
Ksharp
Super User

Yes.  OPTIONS COMPRESS=YES would speed up SAS code due to have sas more calculated resource from System .

PaigeMiller
Diamond | Level 26

From the documentation at https://documentation.sas.com/?cdcId=pgmmvacdc&cdcVersion=9.4&docsetId=lesysoptsref&docsetTarget=n0u...

 

Compressing a file is a process that reduces the number of bytes required to represent each observation. Advantages of compressing a file include reduced storage requirements for the file and fewer I/O operations necessary to read or write to the data during processing. However, more CPU resources are required to read a compressed file (because of the overhead of uncompressing each observation), and there are situations when the resulting file size might increase rather than decrease.

 

So, it ought to take longer to process a compressed file than a non-compressed file.

--
Paige Miller
ben12
Fluorite | Level 6

Yes.  OPTIONS COMPRESS=YES would speed up SAS code due to have sas more calculated resource from System .

I don't understand what you mean. What is a 'calculated resource', and why would SAS have more of this for a compressed file in memory? Can you please explain further? 

 

If a file is compressed, then to access its records, the processor must decompress every record. This requires extra CPU resources. If a file is already in memory, then there is no I/O advantage. So why would data which needs to be decompressed run faster than data which does not need to be decompressed? It's more work for the CPU. This doesn't make any sense. 

34reqrwe
Quartz | Level 8

the WORK libname is on disk  - not in memory . 

DaveHorne
SAS Employee

Generally, I find a net time savings using compress=yes since the I/O reduction is usually more than the CPU "cost".  What I've seen to be a bigger drag on performance is the observation length of the data (whether it's compressed or not).  Often extracts from third party databases (ie. Oracle) bring in variables with a length far greater than they need to be (that's why I often use the %chg_length macro to trim variables to their maximum actual length right away before I do any further processing with the data set.  There is a small "cost" to this as well but if you have a lot of variables > $200 that don't need to be, the run time savings can be significant.

 

Of course, all the usual performance techniques help as well, ie. keep/drop, where clause, indexes, etc.

Tom
Super User Tom
Super User

You haven't shown any code that would load any datasets into memory.

34reqrwe
Quartz | Level 8

Exactly .

 

It is true that compression doesn't help when the data is in memory. We turn it off when loading data into CAS. 

 

However, these datasets are not in memory . 

ben12
Fluorite | Level 6

OK - I always thought that the WORK library was in-memory, since it is volatile, but is that not the case? You have to use options like MEMLIB to force the SAS server to use RAM for the WORK library, right? 

 

I'm running my code in SAS EG. I don't have any control over the administration of the SAS EG server. Is there a way in which I can check the settings for the MEMLIB option from a user side? I tried running PROC OPTIONS, but there was no listing for MEMLIB or MEMCACHE in the output. Does that mean that the option has not been set, and so the WORK library is on-disk? 

34reqrwe
Quartz | Level 8

Hi - if you run this command it will tell you the physical location of your work library:

libname work list;

Generally in SAS EG using memory for the work library would not be a good idea - consider the size of your memory vs the size of datasets you may want to store . 

 

Check out SAS viya for the way forward.

 

cheers

ben12
Fluorite | Level 6

Thank you, and yes: this does print a physical file name, so it appears that WORK is in fact on-disk. 

 

Thanks everyone for your help. This has been a learning experience for me! 

DaveHorne
SAS Employee

To check if MEMLIB is turned on:

proc options option=memlib;run;

The "libname work list" will always show a physical path since other misc/utility files can be placed there (SAS data sets will be in-memory).  I have a server with 256GB of RAM so I run with MEMLIB quite often.  My comment about the observation length applies especially for in-memory files since the uncompressed data set might exhaust your RAM (and the job will fail with an out of WORK space error message).

ben12
Fluorite | Level 6

To check if MEMLIB is turned on:

proc options option=memlib;run;

That gives me an error message: 

ERROR: Unrecognized SAS option name MEMLIB.

So, I assume that means that MEMLIB is not switched on, and hence the WORK library must be on-disk? 

DaveHorne
SAS Employee

@ben12 it sounds like MEMLIB is not supported in the version of SAS you are using (so WORK would be on disk).  

 

One alternative that I played with before MEMLIB was creating a RAMDISK (third party utility) and then point the -WORK option to that location.  If you have enough memory, it does work.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 13 replies
  • 4681 views
  • 11 likes
  • 6 in conversation