How to compress data while importing .txt file

Reply
Occasional Contributor
Posts: 13

How to compress data while importing .txt file

I want to compress the SAS data while importing .txt file into SAS. I have ~4gb for data in .txt file which contains character and numeric variables. Is there any way to compress instead of:

options compress = yes;

Thanks,

Abhee

Super User
Posts: 5,431

Re: How to compress data while importing .txt file

Why not use compress if you wish to compress?

If you have a majority of numeric variables, use compress=bin.

Data never sleeps
Trusted Advisor
Posts: 3,214

Re: How to compress data while importing .txt file

Linus compressing in SAS is using confusing words. SAS and confusing words not something new.

"CHAR to use the RLE (Run Length Encoding) compression algorithm." That is based on repeating bytes.

"BINARY to use the RDC (Ross Data Compression) algorithm"      Dr Ross's Compression Crypt (the wrong one). The real one is October 1992/A Simple Data-Compression Technique

It has nothing to do with the type of sas-variables. I have made that association also and than found being w   rong.  SAS(R) 9.4 Language Reference: Concepts, Fourth Edition

Going back to the compress / data storage saving / question.

-> the best compression is achieved by having no data stored.

    Call it data-virtualization federation or whatever as some sales buzzword.

Hoe can you do this technically?

- use a sas-datastep view on the text-file.  The result, the view, will process the data text-file when necessary. Logically you will see a sas-table dataset.

To extend this,

- you can defines proc sql views in sas. that are views on tables.

- you can define views withins RDBMS systems. This can hide complexity and implement some type of security.

The disadvantage can be all processing overhead when running. (view on view etc). You can design intermediate materalized tables as some checkpoint in your analyses proces 

---->-- ja karman --<-----
Super User
Posts: 11,343

Re: How to compress data while importing .txt file

One common cause of large txt files is extra columns that contain no data. Example: A name field that was defined to be 50 characters but the longest name in the actual file is 15 characters resulting in 35 unneeded blanks per data line in this field.

When you read the data into SAS specify the length of the variable as 15 and it will take less storage space.

Something else that may reduce your SAS dataset size is to use an INVALUE custom format to turn a variable into either a numeric code or a shorter text code. This adds a small amount of complexity to maintenance if the values change occasionally but lets look at fairly common variable: Sex. Suppose the data set is sending you the values of sex as Male and Female. That would normally require 6 characters to store.

proc format;

invalue $sex Upcase default=1)

'MALE' = 'M'

'FEMALE' = 'F'

;

run;

and associate the informat with the variable in the reading program:

informat sex $sex.;

This approach very well with things that don't change very often such as State and County names, any data collected with a restricted selection field such a checkbox that only allows single responses.

Ask a Question
Discussion stats
  • 3 replies
  • 235 views
  • 0 likes
  • 4 in conversation