I want to compress the SAS data while importing .txt file into SAS. I have ~4gb for data in .txt file which contains character and numeric variables. Is there any way to compress instead of:
options compress = yes;
Thanks,
Abhee
Why not use compress if you wish to compress?
If you have a majority of numeric variables, use compress=bin.
Linus compressing in SAS is using confusing words. SAS and confusing words not something new.
"CHAR to use the RLE (Run Length Encoding) compression algorithm." That is based on repeating bytes.
"BINARY to use the RDC (Ross Data Compression) algorithm" Dr Ross's Compression Crypt (the wrong one). The real one is October 1992/A Simple Data-Compression Technique
It has nothing to do with the type of sas-variables. I have made that association also and than found being w rong. SAS(R) 9.4 Language Reference: Concepts, Fourth Edition
Going back to the compress / data storage saving / question.
-> the best compression is achieved by having no data stored.
Call it data-virtualization federation or whatever as some sales buzzword.
Hoe can you do this technically?
- use a sas-datastep view on the text-file. The result, the view, will process the data text-file when necessary. Logically you will see a sas-table dataset.
To extend this,
- you can defines proc sql views in sas. that are views on tables.
- you can define views withins RDBMS systems. This can hide complexity and implement some type of security.
The disadvantage can be all processing overhead when running. (view on view etc). You can design intermediate materalized tables as some checkpoint in your analyses proces
One common cause of large txt files is extra columns that contain no data. Example: A name field that was defined to be 50 characters but the longest name in the actual file is 15 characters resulting in 35 unneeded blanks per data line in this field.
When you read the data into SAS specify the length of the variable as 15 and it will take less storage space.
Something else that may reduce your SAS dataset size is to use an INVALUE custom format to turn a variable into either a numeric code or a shorter text code. This adds a small amount of complexity to maintenance if the values change occasionally but lets look at fairly common variable: Sex. Suppose the data set is sending you the values of sex as Male and Female. That would normally require 6 characters to store.
proc format;
invalue $sex Upcase default=1)
'MALE' = 'M'
'FEMALE' = 'F'
;
run;
and associate the informat with the variable in the reading program:
informat sex $sex.;
This approach very well with things that don't change very often such as State and County names, any data collected with a restricted selection field such a checkbox that only allows single responses.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.