BookmarkSubscribeRSS Feed
Sean_OConnor
Fluorite | Level 6

Folks,

 

I have 13 individual csv files which in total come to nearly 40GB in size. I'm looking for tips in terms of getting efficiencies in reading them to SAS.

 

Originally I had code which would list the 13 csv files and their location and a macro which would cycle through each one and use proc import to read them in and output them to as a SAS file.

 

However, I'm wondering is it best to this or to use an infile statement as I know the position of all the columns and their lengths. 

 

My macro is handy as it uses guessing rows = max but I imagine with such large files now it will take ages to run. 

 

Perhaps an infile method would be better? 

3 REPLIES 3
ChrisHemedinger
Community Manager

INFILE is definitely more efficient, especially since you know the schema for these files. And if all files share the same layout, you can do this in a single DATA step. See this article for how to read multiple text files at once.

Learn from the Experts! Check out the huge catalog of free sessions in the Ask the Expert webinar series.
yabwon
Onyx | Level 15

Furthermore, you can zip that CSV files to save some space and than use the "INFILE ZIP" to read directly from the zipped file.

 

Bart

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation



Tom
Super User Tom
Super User

Is the list of files static?  Perhaps just create a fileref that points to all of the files and read from that.

If you need to skip header rows test for when you are starting a new file.

filename all
('/dir/file1.csv'
 '/dir/file2.csv'
...
 '/dir/file13.csv'
);
data want;
  length fname $256 ;
  infile all dsd truncover filename=fname;
  input @;
  if fname ne lag(fname) then input;
  * code to define variables and input a record ;
run;

Or it you just want to read all of the files whose names follow a simple pattern then use a * wildcard in the physical filename in the INFILE statement.

data want;
  length fname $256 ;
  infile '/dir/file*.csv' dsd truncover filename=fname;
  input @;
  if fname ne lag(fname) then input;
  * code to define variables and input a record ;
run;

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 3 replies
  • 923 views
  • 4 likes
  • 4 in conversation