BookmarkSubscribeRSS Feed
Patrick
Opal | Level 21
Hi

I would have expected that in the example below binary compression reduces the size of the data set more than character compression.


Actually the opposite is true (SAS9.2, Win7, 64Bit):

NOTE: Compressing data set WORK.HAVEB decreased size by 45.84 percent.
Compressed is 645 pages; un-compressed would require 1191 pages.

NOTE: Compressing data set WORK.HAVEC decreased size by 51.97 percent.
Compressed is 572 pages; un-compressed would require 1191 pages.


data haveB(COMPRESS=binary);
do i=1 to 1000000 by 10;
a=' x ';
output;
end;
run;

data haveC(COMPRESS=yes);
do i=1 to 1000000 by 10;
a=' x ';
output;
end;
run;


Can someone explain me this behaviour?

Thanks
Patrick Message was edited by: Patrick
4 REPLIES 4
Peter_C
Rhodochrosite | Level 12
To implement compression of either type SAS inserts information about where and how much compression is made, as well as the compressed values in each page of data that gets written.
There is also row-level overhead.
ymmv
Your experience would vary depending on the mix of data-types and how wide/narrow the row is. I find character compression adequate in the balancing of CPU and I/O
ymmv
peterC
Cynthia_sas
Diamond | Level 26
We have an example in our advanced programming/efficiencies course where this overhead could actually make size -increase-. So "it depends" or "your mileage may vary" or "you gotta benchmark it on your data" is about all anyone can say. I believe that there are several good explanations of compression and the overhead, out and about in papers, doc, etc, including these:
http://www2.sas.com/proceedings/sugi28/003-28.pdf
http://support.sas.com/resources/papers/proceedings09/065-2009.pdf

cynthia
Patrick
Opal | Level 21
Hi Cynthia, Peter

Thanks for your answers.

And yes: RTM - got it.

Seems I was a bit naive to assume that binary compression uses a stronger compression algorithm in general (I had to much zip compression in mind).

Thanks
Patrick
polingjw
Quartz | Level 8
As a very general rule of thumb, binary compression is not effective unless the observation length is greater than a few hundred bytes. Of course, there are many additional considerations and I’m sure that someone could come up with some counterexamples. Reference http://support.sas.com/documentation/cdl/en/lrdict/63026/HTML/default/viewer.htm#a001288760.htm for details.

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 2809 views
  • 0 likes
  • 4 in conversation