Hi all,
In an effort to reduce the file size, I find a macro named %squeeze, with the code here, and I try to apply it with my dataset, I feel quite strange because the result is not as what I expected.
I have a compressed dataset ex_non_trading (I get this dataset by using options compress=yes in another datastep). I follow the macro %squeeze
options compress=yes reuse=yes;
%squeeze(my.ex_non_trading, squozennn)
proc contents data=my.ex_non_trading;
run;
proc contents data=squozennn;
run;
proc means data=my.ex_non_trading;
title 'ex_non_trading';
run;
proc means data=squozennn;
title 'squozennn';
run;
and the output is like that
We can see the file sizes of two datasets are not really different.
And I have a look on the log, I saw that options=compress even reduce around 70% of the file size
NOTE: There were 10978714 observations read from the data set MY.EX_NON_TRADING.
NOTE: The data set WORK.SQUOZENNN has 10978714 observations and 15 variables.
NOTE: Compressing data set WORK.SQUOZENNN decreased size by 70.05 percent.
Compressed is 20550 pages; un-compressed would require 68618 pages.
NOTE: DATA statement used (Total process time):
real time 33.49 seconds
cpu time 9.29 seconds
207 proc contents data=my.ex_non_trading;
208 run;
NOTE: PROCEDURE CONTENTS used (Total process time):
real time 0.05 seconds
cpu time 0.03 seconds
209 proc contents data=squozennn;
210 run;
And I try to run the macro %squeeze without option=compress, the output squozennn now is up to 4GB, four times compared to the original ex_non_trading .So surprise to me
And I also have a look on another document about option=compress
It documents that
Compressing a file is a process that reduces the number of bytes required to represent each observation. In a compressed file, each observation is a variable-length record, while in an uncompressed file, each observation is a fixed-length record
So, in this case, whether we need to use macro %squeeze while options=compress has done all the things? Because from my understanding, %squeeze is to help to retrieve the highest length for each variable, but option=compress did it for each observation.
Warmest regards.
P/S: And woops, I also found the macro named %squeeze1, I am wondering if any of you used to apply this code and I am wondering if it works well?
... View more