BookmarkSubscribeRSS Feed
bbb_NG
Fluorite | Level 6

Hi,

I have a sas7dbat file

NameTypeLengthFormatInformatLabelComments
branch_codeCharacter7$CHAR7.$CHAR7.BRANCH CODEAll value are of three english character
PFS Cust Segment CodeCharacter6   All value are of three english character
LoanCMPCharacter1    
OutstandingCMPCharacter1    
NationCMPCharacter3    
VerdateCharacter204   All value are of 6 numberic character as "201112"
New Cust IndCharacter2$2.00 $2.00 New Cust IndAll value are of one english character
OccupationCMPCharacter13   The longest value for this field is 10 english characters.
cmpCharacter25   The longest value for this field is 10 english characters.
NoCountNumeric8   ??it's length limit was 8, but I do find a value over 100M
TRBLevelCharacter9    

with 3.2M records and 837M disk space.

How to shrink it?

Thanks in advance.

5 REPLIES 5
art297
Opal | Level 21

This sounds like a quiz question.  Since unnecessarily used space still takes up space, you are wasting a lot of space in all of the character fields.  As for the numeric one, read the documentation: http://support.sas.com/documentation/cdl/en/hostunx/61879/HTML/default/viewer.htm#a000344718.htm

Hima
Obsidian | Level 7

Zipping the data set might help.

Astounding
PROC Star

First, you have to be very sure that your "Comments" are correct.  For example, you have to be 100% certain that VERDATE is never longer than six characters.  Since BRANCH_CODE uses a $CHAR7. format/informat, it is not enough to print some values.  The $CHAR formats imply that the variable might contain leading blanks.  You have to determine if it does contain leading blanks, whether you would like to left-hand-justify the characters.  The length needed might be impacted, but the sorted order might change as well.

After you have completed your analysis, set the option and the lengths appropriately.  For example:

options compess=YES;

data my_data;

   length VERDATE $ 6;

   set my_data;

run;

Set the length BEFORE the SET statement, for as many variables as you have analyzed to verify the needed length.

Good luck.

Tom
Super User Tom
Super User

You will probably get pretty good reduction by just using the COMPRESS=YES option.

If you are correct about the maximum lengths of the actual data then you can make the file smaller by re-creating it with shorter character variables.

Where you would want to avoid this is if this is just one example of a number of datasets that you have with this structure.  The other potential datasets might have longer actual values.

There is usually not much value (and some risk) in storing numbers in less than 8 bytes.  SAS always stores numbers are floating point.

The easiest way to recreate with shorter lengths is to set the length before referencing the original data (as suggested above).

Watch out for the variables that have permanent formats attached. You should change those (or better just remove them).  You will need to put the FORMAT statement after the set statement for it to win out.

data want ;

   length branch_code $3 .... ;

   set old ;

   format _character_ ;

run;

bbb_NG
Fluorite | Level 6

Dear All,

Thank you very much for your help and sharing thoughts.

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

New Learning Events in April

 

Join us for two new fee-based courses: Administrative Healthcare Data and SAS via Live Web Monday-Thursday, April 24-27 from 1:00 to 4:30 PM ET each day. And Administrative Healthcare Data and SAS: Hands-On Programming Workshop via Live Web on Friday, April 28 from 9:00 AM to 5:00 PM ET.

LEARN MORE

Discussion stats
  • 5 replies
  • 1245 views
  • 0 likes
  • 5 in conversation