DATA Step, Macro, Functions and more

Character vs Binary compression

Reply
Respected Advisor
Posts: 4,173

Character vs Binary compression

Hi

I would have expected that in the example below binary compression reduces the size of the data set more than character compression.


Actually the opposite is true (SAS9.2, Win7, 64Bit):

NOTE: Compressing data set WORK.HAVEB decreased size by 45.84 percent.
Compressed is 645 pages; un-compressed would require 1191 pages.

NOTE: Compressing data set WORK.HAVEC decreased size by 51.97 percent.
Compressed is 572 pages; un-compressed would require 1191 pages.


data haveB(COMPRESS=binary);
do i=1 to 1000000 by 10;
a=' x ';
output;
end;
run;

data haveC(COMPRESS=yes);
do i=1 to 1000000 by 10;
a=' x ';
output;
end;
run;


Can someone explain me this behaviour?

Thanks
Patrick Message was edited by: Patrick
Valued Guide
Posts: 2,177

Re: Character vs Binary compression

To implement compression of either type SAS inserts information about where and how much compression is made, as well as the compressed values in each page of data that gets written.
There is also row-level overhead.
ymmv
Your experience would vary depending on the mix of data-types and how wide/narrow the row is. I find character compression adequate in the balancing of CPU and I/O
ymmv
peterC
SAS Super FREQ
Posts: 8,868

Re: Character vs Binary compression

We have an example in our advanced programming/efficiencies course where this overhead could actually make size -increase-. So "it depends" or "your mileage may vary" or "you gotta benchmark it on your data" is about all anyone can say. I believe that there are several good explanations of compression and the overhead, out and about in papers, doc, etc, including these:
http://www2.sas.com/proceedings/sugi28/003-28.pdf
http://support.sas.com/resources/papers/proceedings09/065-2009.pdf

cynthia
Respected Advisor
Posts: 4,173

Re: Character vs Binary compression

Posted in reply to Cynthia_sas
Hi Cynthia, Peter

Thanks for your answers.

And yes: RTM - got it.

Seems I was a bit naive to assume that binary compression uses a stronger compression algorithm in general (I had to much zip compression in mind).

Thanks
Patrick
Regular Contributor
Posts: 171

Re: Character vs Binary compression

As a very general rule of thumb, binary compression is not effective unless the observation length is greater than a few hundred bytes. Of course, there are many additional considerations and I’m sure that someone could come up with some counterexamples. Reference http://support.sas.com/documentation/cdl/en/lrdict/63026/HTML/default/viewer.htm#a001288760.htm for details.
Ask a Question
Discussion stats
  • 4 replies
  • 178 views
  • 0 likes
  • 4 in conversation