DATA Step, Macro, Functions and more

compress =yes CPU time

Reply
Super Contributor
Posts: 267

compress =yes CPU time

Hello everyone,

 

The compress option will save space around 50% for large dataset. but will run longer and need more time to finish a program.

 

May I ask approximately how much more CPU time needed to use compress =yes on large SAS dataset? 

if the original program use 2 hours to run without compress=yes. 

 

Thanks

Super User
Posts: 23,320

Re: compress =yes CPU time

Posted in reply to GeorgeSAS

Without knowing the specifics there's no way to know. it depends on exactly what's going on in your program. 

 

You can test it. I'm *think* SAS states if the compression will not improve performance and will not implement it if it worsens performance. That being said, you should still test it. 

Regular Contributor
Posts: 150

Re: compress =yes CPU time

Posted in reply to GeorgeSAS

The space savings and the impact on CPU will vary depending upon the data encountered.

 

If you have a dataset with loads of empty string values, for example, lastName, then it 'might' save you space. It depends on how much space is held by the string fields.

 

For example (end of field denoted by *):

 

Smith                                                 *

Westinghouse                                    *

 

With compression, it becomes:

 

Smith*

Westinghouse*

 

That is not 100% accurate but it illustrates the point.

 

If you have a dataset with nothing but numerics, it will buy no change in space and cost CPU overhead. Keep in mind, I am simplifying what happens here: it is more complex.

 

 

Test your datasets. Datasets with lots of long string fields are good canidates for compression..

Super User
Posts: 6,637

Re: compress =yes CPU time

Posted in reply to GeorgeSAS

There may be another way.  COMPRESS=YES primarily works on character strings.  (It has some impact on numerics as well, but most of the savings is for long character variables that contain a bunch of blanks.)  If you are saving 50% by using compression, it is likely you could save on the space by selecting shorter lengths for some of your variables.  It would take some inspection of the data to see whether better lengths are possible and what they should be.  But you would gain some of the benefits of compression without having to actually compress the data.  If that turns out to be the case, you could save both space and time by shortening the variables.

Super User
Super User
Posts: 7,936

Re: compress =yes CPU time

Posted in reply to GeorgeSAS

GeorgeSAS wrote:

Hello everyone,

 

The compress option will save space around 50% for large dataset. but will run longer and need more time to finish a program.

 

May I ask approximately how much more CPU time needed to use compress =yes on large SAS dataset? 

if the original program use 2 hours to run without compress=yes. 

 

Thanks


In general I find that if compression saves 50% of the disk space using it will also REDUCE run times.

That is because most SAS programs are I/O bound and not CPU bound so having to read and/or write fewer disk blocks will make the program run faster even it uses more CPU time.

PROC Star
Posts: 2,318

Re: compress =yes CPU time

>In general I find that if compression saves 50% of the disk space using it will also REDUCE run times.

 

Indeed. CPU speeds have progressed a lot faster than storage speeds.

 

If you want compression ratios that routinely reach 80-90% compression on non-binary (numbers and strings) data (and associated speed increases), SPDE's binary compression is what you need.

 

Super Contributor
Posts: 267

Re: compress =yes CPU time

This is very interesting,I will test it soon.

Thanks!
Ask a Question
Discussion stats
  • 6 replies
  • 254 views
  • 3 likes
  • 6 in conversation