Please help, I'm somewhat skilled with DI Studio - but not necessarily with SAS programming. I need some good details on how to read multiple CSV files into a SAS table using DI Studio.
The files will all have the same fields - but the filenames will vary. For example the filenames could be "Detail_12345.csv, Detail_45678.csv, and Detail_98765.csv
I've been reading that I should create user written code to read in the files at once. Another suggestion was to create an iterative job to pass the filename to the main job that read each file.
What do you suggest I do - and what are the specific transformations I should utilize?
thank you!
To use DIS efficiently you need to get to an at least intermediate SAS coding level (Base SAS, SQL and macro language).
Without the coding skills debugging (reading logs) will be difficult and you'll also be challenged to implement not only logically correct but also performant processes.
As for your question:
If your .csv's all have the same structure and don't have a header row with column names then you could use a wildcard: Detail_*.csv
Create a job with an external file metadata object for one of the .csv's, a File Reader and a target SAS table and test if this works. Once tested change the name of the external file in the external file metadata to the asterix syntax (the wildcard name).
Should your .csv's have a header row then things become a bit more involved and you need to give as as much detail as you can (i.e. how the folder where the .csv's are looks like - read everything in it or just files following a naming pattern, how often do you need to run the process, is this about a fully productionworthy implementation, etc. ).
One way to go would be to first create a listing of .csv names (user written code), load these names into a temporary SAS Work table and then use this work tables as control table for a loop transformation. Inside the loop transformation (the inner job) is external table metadata (this time with a macro variable in double quotes as file name - the parameter passed to the inner job, a file reader and an append node which collects all the data read via the iterations through the inner job).
If you've got header rows then there is eventually also a "hack" option which still would allow you to use the simple wildcard approach - but it's something I would first need to test myself before I can propose it.
I suggest that as a first step you implement a DIS job which reads a single .csv into your target table and we go from there.
To use DIS efficiently you need to get to an at least intermediate SAS coding level (Base SAS, SQL and macro language).
Without the coding skills debugging (reading logs) will be difficult and you'll also be challenged to implement not only logically correct but also performant processes.
As for your question:
If your .csv's all have the same structure and don't have a header row with column names then you could use a wildcard: Detail_*.csv
Create a job with an external file metadata object for one of the .csv's, a File Reader and a target SAS table and test if this works. Once tested change the name of the external file in the external file metadata to the asterix syntax (the wildcard name).
Should your .csv's have a header row then things become a bit more involved and you need to give as as much detail as you can (i.e. how the folder where the .csv's are looks like - read everything in it or just files following a naming pattern, how often do you need to run the process, is this about a fully productionworthy implementation, etc. ).
One way to go would be to first create a listing of .csv names (user written code), load these names into a temporary SAS Work table and then use this work tables as control table for a loop transformation. Inside the loop transformation (the inner job) is external table metadata (this time with a macro variable in double quotes as file name - the parameter passed to the inner job, a file reader and an append node which collects all the data read via the iterations through the inner job).
If you've got header rows then there is eventually also a "hack" option which still would allow you to use the simple wildcard approach - but it's something I would first need to test myself before I can propose it.
I suggest that as a first step you implement a DIS job which reads a single .csv into your target table and we go from there.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.
Find more tutorials on the SAS Users YouTube channel.