BookmarkSubscribeRSS Feed
Patrick
Opal | Level 21
Hi

I would have expected that in the example below binary compression reduces the size of the data set more than character compression.


Actually the opposite is true (SAS9.2, Win7, 64Bit):

NOTE: Compressing data set WORK.HAVEB decreased size by 45.84 percent.
Compressed is 645 pages; un-compressed would require 1191 pages.

NOTE: Compressing data set WORK.HAVEC decreased size by 51.97 percent.
Compressed is 572 pages; un-compressed would require 1191 pages.


data haveB(COMPRESS=binary);
do i=1 to 1000000 by 10;
a=' x ';
output;
end;
run;

data haveC(COMPRESS=yes);
do i=1 to 1000000 by 10;
a=' x ';
output;
end;
run;


Can someone explain me this behaviour?

Thanks
Patrick Message was edited by: Patrick
4 REPLIES 4
Peter_C
Rhodochrosite | Level 12
To implement compression of either type SAS inserts information about where and how much compression is made, as well as the compressed values in each page of data that gets written.
There is also row-level overhead.
ymmv
Your experience would vary depending on the mix of data-types and how wide/narrow the row is. I find character compression adequate in the balancing of CPU and I/O
ymmv
peterC
Cynthia_sas
SAS Super FREQ
We have an example in our advanced programming/efficiencies course where this overhead could actually make size -increase-. So "it depends" or "your mileage may vary" or "you gotta benchmark it on your data" is about all anyone can say. I believe that there are several good explanations of compression and the overhead, out and about in papers, doc, etc, including these:
http://www2.sas.com/proceedings/sugi28/003-28.pdf
http://support.sas.com/resources/papers/proceedings09/065-2009.pdf

cynthia
Patrick
Opal | Level 21
Hi Cynthia, Peter

Thanks for your answers.

And yes: RTM - got it.

Seems I was a bit naive to assume that binary compression uses a stronger compression algorithm in general (I had to much zip compression in mind).

Thanks
Patrick
polingjw
Quartz | Level 8
As a very general rule of thumb, binary compression is not effective unless the observation length is greater than a few hundred bytes. Of course, there are many additional considerations and I’m sure that someone could come up with some counterexamples. Reference http://support.sas.com/documentation/cdl/en/lrdict/63026/HTML/default/viewer.htm#a001288760.htm for details.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 1455 views
  • 0 likes
  • 4 in conversation