BookmarkSubscribeRSS Feed
Sean_OConnor
Obsidian | Level 7

Folks,

 

I have 13 individual csv files which in total come to nearly 40GB in size. I'm looking for tips in terms of getting efficiencies in reading them to SAS.

 

Originally I had code which would list the 13 csv files and their location and a macro which would cycle through each one and use proc import to read them in and output them to as a SAS file.

 

However, I'm wondering is it best to this or to use an infile statement as I know the position of all the columns and their lengths. 

 

My macro is handy as it uses guessing rows = max but I imagine with such large files now it will take ages to run. 

 

Perhaps an infile method would be better? 

3 REPLIES 3
ChrisHemedinger
Community Manager

INFILE is definitely more efficient, especially since you know the schema for these files. And if all files share the same layout, you can do this in a single DATA step. See this article for how to read multiple text files at once.

Register for SAS Innovate 2025!! The premier event for SAS users, May 6-9 in Orlando FL. Sign up now for the best deals!
yabwon
Onyx | Level 15

Furthermore, you can zip that CSV files to save some space and than use the "INFILE ZIP" to read directly from the zipped file.

 

Bart

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation



Tom
Super User Tom
Super User

Is the list of files static?  Perhaps just create a fileref that points to all of the files and read from that.

If you need to skip header rows test for when you are starting a new file.

filename all
('/dir/file1.csv'
 '/dir/file2.csv'
...
 '/dir/file13.csv'
);
data want;
  length fname $256 ;
  infile all dsd truncover filename=fname;
  input @;
  if fname ne lag(fname) then input;
  * code to define variables and input a record ;
run;

Or it you just want to read all of the files whose names follow a simple pattern then use a * wildcard in the physical filename in the INFILE statement.

data want;
  length fname $256 ;
  infile '/dir/file*.csv' dsd truncover filename=fname;
  input @;
  if fname ne lag(fname) then input;
  * code to define variables and input a record ;
run;

sas-innovate-wordmark-2025-midnight.png

Register Today!

Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.


Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 3 replies
  • 1059 views
  • 4 likes
  • 4 in conversation