When driving to and from work in the mornings and evenings, I have several choices of routes to take. Some include the highway, while others are all local roads. Which route I take depends on the time of day, the weather, or just a hunch. In SAS there are multiple methods to reduce the size of your data. I cover one such approach in this article. And just like driving, there may be a better way for you to shrink your data. In the words of many developers, "The answer is, it depends." If you have other techniques or suggestions feel free to leave a comment.
I've attached a script that performs the following steps:
First, create a new SAS Library called TRIM using the following SAS code.
libname trim '/export/viya/homes/john.doe@sas.com/data';
Next, create a large data file based on the hmeq data set using the following SAS code:
data trim.hmeq;
set sampsio.hmeq;
do i=1 to 1000;
output;
end;
run;
This results in a 678MB file in the TRIM library.
Further, I upload and opened the trimcolumss.sas file (attached). The following lines of code will need to be replace with values particular to your environment.
Line 70: %trimcolumns(in=trim.hmeq,out=trim.hmeq_trim); - replace with your input and output files
Line 74: filename myfile '/export/viya/homes/john.doe@sas.com/data/hmeq.sas7bdat'; - replace with the input file path
Line 83: filename myfile '/export/viya/homes/john.doe@sas.com/data/hmeq_trim.sas7bdat'; - replace with the output file path
Lines 97 - 100: replace the caslib and table names in various parameters
Once the code is kicked off, the data is processed, and creates the new, trimmed file. The output resembles the following image.
Notice, the trimming process reduced the file size from 638MB to 501MB. Results will certainly vary based on your input data, but I'm pleased with my almost 22 percent reduction in size. When we're talking GBs of data, this could be significant.
At the end of the script, there is commented-out code. This code loads the data into a caslib of your choice on the CAS server.
Finally, realize that in this exercise columns were reduced in size for the current data. If additions are made to the trimmed data set, it may cause issues if the incoming data exceeds the new widths.
As I stated earlier, there is no silver bullet solution when trying to reduce the size of large data sets. Many factors are at play to decide which method to adopt. This article has provide a tool that is easily set up and run.
Catch the best of SAS Innovate 2025 — anytime, anywhere. Stream powerful keynotes, real-world demos, and game-changing insights from the world’s leading data and AI minds.
Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Join us at the 2025 SAS Hackathon Sept. 15 – Oct 10. Visit the SAS Hackathon homepage.
Check it out!
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.