BookmarkSubscribeRSS Feed
Sean_OConnor
Fluorite | Level 6

Folks,

 

I have 13 individual csv files which in total come to nearly 40GB in size. I'm looking for tips in terms of getting efficiencies in reading them to SAS.

 

Originally I had code which would list the 13 csv files and their location and a macro which would cycle through each one and use proc import to read them in and output them to as a SAS file.

 

However, I'm wondering is it best to this or to use an infile statement as I know the position of all the columns and their lengths. 

 

My macro is handy as it uses guessing rows = max but I imagine with such large files now it will take ages to run. 

 

Perhaps an infile method would be better? 

3 REPLIES 3
ChrisHemedinger
Community Manager

INFILE is definitely more efficient, especially since you know the schema for these files. And if all files share the same layout, you can do this in a single DATA step. See this article for how to read multiple text files at once.

SAS Hackathon registration is open! Build your skills. Make connections. Enjoy creative freedom. Maybe change the world.
yabwon
Onyx | Level 15

Furthermore, you can zip that CSV files to save some space and than use the "INFILE ZIP" to read directly from the zipped file.

 

Bart

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation



Tom
Super User Tom
Super User

Is the list of files static?  Perhaps just create a fileref that points to all of the files and read from that.

If you need to skip header rows test for when you are starting a new file.

filename all
('/dir/file1.csv'
 '/dir/file2.csv'
...
 '/dir/file13.csv'
);
data want;
  length fname $256 ;
  infile all dsd truncover filename=fname;
  input @;
  if fname ne lag(fname) then input;
  * code to define variables and input a record ;
run;

Or it you just want to read all of the files whose names follow a simple pattern then use a * wildcard in the physical filename in the INFILE statement.

data want;
  length fname $256 ;
  infile '/dir/file*.csv' dsd truncover filename=fname;
  input @;
  if fname ne lag(fname) then input;
  * code to define variables and input a record ;
run;

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 729 views
  • 4 likes
  • 4 in conversation