BookmarkSubscribeRSS Feed
Data Trimming: One Method to Reduce Your Data Footprint
joeFurbee
Community Manager

When driving to and from work in the mornings and evenings, I have several choices of routes to take. Some include the highway, while others are all local roads. Which route I take depends on the time of day, the weather, or just a hunch. In SAS there are multiple methods to reduce the size of your data. I cover one such approach in this article. And just like driving, there may be a better way for you to shrink your data. In the words of many developers, "The answer is, it depends." If you have other techniques or suggestions feel free  to leave a comment.

 

The Overview

I've attached a script that performs the following steps:

  1. Scans all columns
  2. Identifies white space,
  3. Reduces column byte allotment
  4. Produces a report on before and after file sizes
  5. Optionally loads the table into CAS

 

The Details

First, create a new SAS Library called TRIM using the following SAS code.

libname trim '/export/viya/homes/john.doe@sas.com/data';

 

Create Data Set

Next, create a large data file based on the hmeq data set using the following SAS code:

data trim.hmeq;
set sampsio.hmeq;
do i=1 to 1000;
output;
end;
run;

This results in a 678MB file in the TRIM library.

 

Upload, Update, and Run Code

Further, I upload and opened the trimcolumss.sas file (attached). The following lines of code will need to be replace with values particular to your environment.

 

Line 70: %trimcolumns(in=trim.hmeq,out=trim.hmeq_trim); - replace with your input and output files

Line 74:  filename myfile '/export/viya/homes/john.doe@sas.com/data/hmeq.sas7bdat'; - replace with the input file path

Line 83: filename myfile '/export/viya/homes/john.doe@sas.com/data/hmeq_trim.sas7bdat'; - replace with the output file path

Lines 97 - 100: replace the caslib and table names in various parameters

 

The Results

Once the code is kicked off, the data is processed, and creates the new, trimmed file. The output resembles the following image.

joeFurbee_0-1727529842175.png

 

 

Notice, the trimming process reduced the file size from 638MB to 501MB. Results will certainly vary based on your input data, but I'm pleased with my almost 22 percent reduction in size. When we're talking GBs of data, this could be significant. 

 

At the end of the script, there is commented-out code. This code loads the data into a caslib of your choice on the CAS server. 

 

Finally, realize that in this exercise columns were reduced in size for the current data. If additions are made to the trimmed data set, it may cause issues if the incoming data exceeds the new widths.

 

Conclusion

As I stated earlier, there is no silver bullet solution when trying to reduce the size of large data sets. Many factors are at play to decide which method to adopt. This article has provide a tool that is easily set up and run. 

sas-innovate-white.png

Missed SAS Innovate in Orlando?

Catch the best of SAS Innovate 2025 — anytime, anywhere. Stream powerful keynotes, real-world demos, and game-changing insights from the world’s leading data and AI minds.

 

Register now

Article Labels
Article Tags
Contributors

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Join us at the 2025 SAS Hackathon Sept. 15 – Oct 10. Visit the SAS Hackathon homepage.

Check it out!