BookmarkSubscribeRSS Feed
GeorgeSAS
Lapis Lazuli | Level 10

Hello everyone,

 

The compress option will save space around 50% for large dataset. but will run longer and need more time to finish a program.

 

May I ask approximately how much more CPU time needed to use compress =yes on large SAS dataset? 

if the original program use 2 hours to run without compress=yes. 

 

Thanks

6 REPLIES 6
Reeza
Super User

Without knowing the specifics there's no way to know. it depends on exactly what's going on in your program. 

 

You can test it. I'm *think* SAS states if the compression will not improve performance and will not implement it if it worsens performance. That being said, you should still test it. 

AlanC
Barite | Level 11

The space savings and the impact on CPU will vary depending upon the data encountered.

 

If you have a dataset with loads of empty string values, for example, lastName, then it 'might' save you space. It depends on how much space is held by the string fields.

 

For example (end of field denoted by *):

 

Smith                                                 *

Westinghouse                                    *

 

With compression, it becomes:

 

Smith*

Westinghouse*

 

That is not 100% accurate but it illustrates the point.

 

If you have a dataset with nothing but numerics, it will buy no change in space and cost CPU overhead. Keep in mind, I am simplifying what happens here: it is more complex.

 

 

Test your datasets. Datasets with lots of long string fields are good canidates for compression..

https://github.com/savian-net
Astounding
PROC Star

There may be another way.  COMPRESS=YES primarily works on character strings.  (It has some impact on numerics as well, but most of the savings is for long character variables that contain a bunch of blanks.)  If you are saving 50% by using compression, it is likely you could save on the space by selecting shorter lengths for some of your variables.  It would take some inspection of the data to see whether better lengths are possible and what they should be.  But you would gain some of the benefits of compression without having to actually compress the data.  If that turns out to be the case, you could save both space and time by shortening the variables.

Tom
Super User Tom
Super User

@GeorgeSAS wrote:

Hello everyone,

 

The compress option will save space around 50% for large dataset. but will run longer and need more time to finish a program.

 

May I ask approximately how much more CPU time needed to use compress =yes on large SAS dataset? 

if the original program use 2 hours to run without compress=yes. 

 

Thanks


In general I find that if compression saves 50% of the disk space using it will also REDUCE run times.

That is because most SAS programs are I/O bound and not CPU bound so having to read and/or write fewer disk blocks will make the program run faster even it uses more CPU time.

ChrisNZ
Tourmaline | Level 20

>In general I find that if compression saves 50% of the disk space using it will also REDUCE run times.

 

Indeed. CPU speeds have progressed a lot faster than storage speeds.

 

If you want compression ratios that routinely reach 80-90% compression on non-binary (numbers and strings) data (and associated speed increases), SPDE's binary compression is what you need.

 

GeorgeSAS
Lapis Lazuli | Level 10
This is very interesting,I will test it soon.

Thanks!

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 2224 views
  • 3 likes
  • 6 in conversation