BookmarkSubscribeRSS Feed
abhee
Calcite | Level 5

I want to compress the SAS data while importing .txt file into SAS. I have ~4gb for data in .txt file which contains character and numeric variables. Is there any way to compress instead of:

options compress = yes;

Thanks,

Abhee

3 REPLIES 3
LinusH
Tourmaline | Level 20

Why not use compress if you wish to compress?

If you have a majority of numeric variables, use compress=bin.

Data never sleeps
jakarman
Barite | Level 11

Linus compressing in SAS is using confusing words. SAS and confusing words not something new.

"CHAR to use the RLE (Run Length Encoding) compression algorithm." That is based on repeating bytes.

"BINARY to use the RDC (Ross Data Compression) algorithm"      Dr Ross's Compression Crypt (the wrong one). The real one is October 1992/A Simple Data-Compression Technique

It has nothing to do with the type of sas-variables. I have made that association also and than found being w   rong.  SAS(R) 9.4 Language Reference: Concepts, Fourth Edition

Going back to the compress / data storage saving / question.

-> the best compression is achieved by having no data stored.

    Call it data-virtualization federation or whatever as some sales buzzword.

Hoe can you do this technically?

- use a sas-datastep view on the text-file.  The result, the view, will process the data text-file when necessary. Logically you will see a sas-table dataset.

To extend this,

- you can defines proc sql views in sas. that are views on tables.

- you can define views withins RDBMS systems. This can hide complexity and implement some type of security.

The disadvantage can be all processing overhead when running. (view on view etc). You can design intermediate materalized tables as some checkpoint in your analyses proces 

---->-- ja karman --<-----
ballardw
Super User

One common cause of large txt files is extra columns that contain no data. Example: A name field that was defined to be 50 characters but the longest name in the actual file is 15 characters resulting in 35 unneeded blanks per data line in this field.

When you read the data into SAS specify the length of the variable as 15 and it will take less storage space.

Something else that may reduce your SAS dataset size is to use an INVALUE custom format to turn a variable into either a numeric code or a shorter text code. This adds a small amount of complexity to maintenance if the values change occasionally but lets look at fairly common variable: Sex. Suppose the data set is sending you the values of sex as Male and Female. That would normally require 6 characters to store.

proc format;

invalue $sex Upcase default=1)

'MALE' = 'M'

'FEMALE' = 'F'

;

run;

and associate the informat with the variable in the reading program:

informat sex $sex.;

This approach very well with things that don't change very often such as State and County names, any data collected with a restricted selection field such a checkbox that only allows single responses.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 802 views
  • 0 likes
  • 4 in conversation