DATA Step, Macro, Functions and more

Low compression ratio in SAS

Frequent Contributor
Posts: 93

Low compression ratio in SAS


I am using the compression option of SAS and the compression ratio I get is very low.

I haved used binary and char methods and the compression obtanided is 2% and 8%.

I put my code. Is there any other way of compression more effective?

Any help will be greatly aprecciatted:

data detalle(drop = i);;
length num1 num2 8. campo1 campo2 campo3 $10.;
do i = 1 to 1000000;
num1 = i;
num2 = round(20*ranuni(1));
campo1= 'aaaaaa';
campo2 = compress('P'||round(10*ranuni(1)));
campo3 = byte(65 + round(ranuni(1)*25));

data comprimida(compress=BINARY);
set detalle;

data comprimida(compress=YES);
set detalle;

data comprimida(compress=char);
set detalle;

proc contents data=detalle; /* 48,3 Mb */

proc contents data=comprimida; /* 44,3 Mb */

Super User
Posts: 3,110

Re: Low compression ratio in SAS

Compression is only useful when you have a larger number of columns and/or long character columns with lots of blank space. Try making your character columns 100 or 200 characters long.

Super User
Posts: 5,085

Re: Low compression ratio in SAS

Those rates are low, but they are a reflection of the data you are compressing.

CHAR compression works best when you have repeated characters (such as missing values in your character variables).  It would also work better if you had integers as your numeric values.

BINARY works well when you have patterns of values that repeat.  In addition to the CHAR categories, that would also include missing values for numeric variables.

You can easily change the characteristics of your data to those that are better suited to compression.  But it's more likely you should choose data characteristics that more closely approximate what you expect in your real life data before deciding on the better method.

Good luck.

Ask a Question
Discussion stats
  • 2 replies
  • 3 in conversation