BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Phil_NZ
Barite | Level 11

Hi all,

 

In an effort to reduce the file size, I find a macro named %squeeze, with the code here, and I try to apply it with my dataset, I feel quite strange because the result is not as what I expected. 

I have a compressed dataset ex_non_trading (I get this dataset by using options compress=yes in another datastep). I follow the macro %squeeze 

options compress=yes reuse=yes;
%squeeze(my.ex_non_trading, squozennn)
proc contents data=my.ex_non_trading;
run;
proc contents data=squozennn;
run;

proc means data=my.ex_non_trading;
title 'ex_non_trading';
run;

proc means data=squozennn;
title 'squozennn';
run;

and the output is like that

My97_0-1615782002949.png

My97_1-1615782020654.png

We can see the file sizes of two datasets are not really different.

And I have a look on the log, I saw that options=compress even reduce around 70% of the file size

NOTE: There were 10978714 observations read from the data set MY.EX_NON_TRADING.
NOTE: The data set WORK.SQUOZENNN has 10978714 observations and 15 variables.
NOTE: Compressing data set WORK.SQUOZENNN decreased size by 70.05 percent. 
      Compressed is 20550 pages; un-compressed would require 68618 pages.
NOTE: DATA statement used (Total process time):
      real time           33.49 seconds
      cpu time            9.29 seconds
      

207        proc contents data=my.ex_non_trading;
208        run;

NOTE: PROCEDURE CONTENTS used (Total process time):
      real time           0.05 seconds
      cpu time            0.03 seconds
      

209        proc contents data=squozennn;
210        run;

And I try to run the macro %squeeze without option=compress, the output squozennn now is up to 4GB, four times compared to the original ex_non_trading .So surprise to me

My97_0-1615783375802.png

 

 

And I also have a look on another document about option=compress

It documents that 

Compressing a file is a process that reduces the number of bytes required to represent each observation. In a compressed file, each observation is a variable-length record, while in an uncompressed file, each observation is a fixed-length record

So, in this case, whether we need to use macro %squeeze while options=compress has done all the things? Because from my understanding, %squeeze is to help to retrieve the highest length for each variable, but option=compress did it for each observation.

 

Warmest regards.

 

P/S: And woops, I also found the macro named %squeeze1, I am wondering if any of you used to apply this code and I am wondering if it works well?

 

Thank you for your help, have a fabulous and productive day! I am a novice today, but someday when I accumulate enough knowledge, I can help others in my capacity.
1 ACCEPTED SOLUTION

Accepted Solutions
Kurt_Bremser
Super User

%SQUEEZE reduces the defined length of variables, e.g. the numeric length of a date to 4.

COMPRESS reduces the used length by compressing sequences of repeated characters (mainly the blanks).

 

Squeezed datasets may cause problems later, if you have to combine datasets where the defined lengths differ because of the content. COMPRESS on its own never poses such a problem; there are some datasets where compressing actually increases the filesize, but not by a large margin.

View solution in original post

3 REPLIES 3
SASKiwi
PROC Star

In my experience a lot of SAS sites have COMPRESS = YES switched on as a permanent session option because it can both reduce disk storage significantly as well as reducing IO. You might also try COMPRESS = BINARY as that can sometimes do better than YES.

 

I never bother with %SQUEEZE as requires additional processing with unpredictable results.

Kurt_Bremser
Super User

%SQUEEZE reduces the defined length of variables, e.g. the numeric length of a date to 4.

COMPRESS reduces the used length by compressing sequences of repeated characters (mainly the blanks).

 

Squeezed datasets may cause problems later, if you have to combine datasets where the defined lengths differ because of the content. COMPRESS on its own never poses such a problem; there are some datasets where compressing actually increases the filesize, but not by a large margin.

ChrisNZ
Tourmaline | Level 20

You are much better off storing your data using the SPDE engine with binary compression than any other method. And no need to end up with unpredictable variables lengths (that will give you headaches when merging) if you do that. 

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 1322 views
  • 3 likes
  • 4 in conversation