- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Posted 11-22-2010 04:16 AM
(2379 views)
Hi
I would have expected that in the example below binary compression reduces the size of the data set more than character compression.
Actually the opposite is true (SAS9.2, Win7, 64Bit):
NOTE: Compressing data set WORK.HAVEB decreased size by 45.84 percent.
Compressed is 645 pages; un-compressed would require 1191 pages.
NOTE: Compressing data set WORK.HAVEC decreased size by 51.97 percent.
Compressed is 572 pages; un-compressed would require 1191 pages.
data haveB(COMPRESS=binary);
do i=1 to 1000000 by 10;
a=' x ';
output;
end;
run;
data haveC(COMPRESS=yes);
do i=1 to 1000000 by 10;
a=' x ';
output;
end;
run;
Can someone explain me this behaviour?
Thanks
Patrick Message was edited by: Patrick
I would have expected that in the example below binary compression reduces the size of the data set more than character compression.
Actually the opposite is true (SAS9.2, Win7, 64Bit):
NOTE: Compressing data set WORK.HAVEB decreased size by 45.84 percent.
Compressed is 645 pages; un-compressed would require 1191 pages.
NOTE: Compressing data set WORK.HAVEC decreased size by 51.97 percent.
Compressed is 572 pages; un-compressed would require 1191 pages.
data haveB(COMPRESS=binary);
do i=1 to 1000000 by 10;
a=' x ';
output;
end;
run;
data haveC(COMPRESS=yes);
do i=1 to 1000000 by 10;
a=' x ';
output;
end;
run;
Can someone explain me this behaviour?
Thanks
Patrick Message was edited by: Patrick
4 REPLIES 4
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
To implement compression of either type SAS inserts information about where and how much compression is made, as well as the compressed values in each page of data that gets written.
There is also row-level overhead.
ymmv
Your experience would vary depending on the mix of data-types and how wide/narrow the row is. I find character compression adequate in the balancing of CPU and I/O
ymmv
peterC
There is also row-level overhead.
ymmv
Your experience would vary depending on the mix of data-types and how wide/narrow the row is. I find character compression adequate in the balancing of CPU and I/O
ymmv
peterC
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
We have an example in our advanced programming/efficiencies course where this overhead could actually make size -increase-. So "it depends" or "your mileage may vary" or "you gotta benchmark it on your data" is about all anyone can say. I believe that there are several good explanations of compression and the overhead, out and about in papers, doc, etc, including these:
http://www2.sas.com/proceedings/sugi28/003-28.pdf
http://support.sas.com/resources/papers/proceedings09/065-2009.pdf
cynthia
http://www2.sas.com/proceedings/sugi28/003-28.pdf
http://support.sas.com/resources/papers/proceedings09/065-2009.pdf
cynthia
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi Cynthia, Peter
Thanks for your answers.
And yes: RTM - got it.
Seems I was a bit naive to assume that binary compression uses a stronger compression algorithm in general (I had to much zip compression in mind).
Thanks
Patrick
Thanks for your answers.
And yes: RTM - got it.
Seems I was a bit naive to assume that binary compression uses a stronger compression algorithm in general (I had to much zip compression in mind).
Thanks
Patrick
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
As a very general rule of thumb, binary compression is not effective unless the observation length is greater than a few hundred bytes. Of course, there are many additional considerations and I’m sure that someone could come up with some counterexamples. Reference http://support.sas.com/documentation/cdl/en/lrdict/63026/HTML/default/viewer.htm#a001288760.htm for details.