BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
kjarvis
Fluorite | Level 6

Please help, I'm somewhat skilled with DI Studio - but not necessarily with SAS programming. I need some good details on how to read multiple CSV files into a SAS table using DI Studio.

 

The files will all have the same fields - but the filenames will vary. For example the filenames could be "Detail_12345.csv, Detail_45678.csv, and Detail_98765.csv

 

I've been reading that I should create user written code to read in the files at once. Another suggestion was to create an iterative job to pass the filename to the main job that read each file.

 

What do you suggest I do - and what are the specific transformations I should utilize?

 

thank you!

1 ACCEPTED SOLUTION

Accepted Solutions
Patrick
Opal | Level 21

To use DIS efficiently you need to get to an at least intermediate SAS coding level (Base SAS, SQL and macro language).

 

Without the coding skills debugging (reading logs) will be difficult and you'll also be challenged to implement not only logically correct but also performant processes.

 

As for your question:

If your .csv's all have the same structure and don't have a header row with column names then you could use a wildcard: Detail_*.csv

Create a job with an external file metadata object for one of the .csv's, a File Reader and a target SAS table and test if this works. Once tested change the name of the external file in the external file metadata to the asterix syntax (the wildcard name).

 

Should your .csv's have a header row then things become a bit more involved and you need to give as as much detail as you can (i.e. how the folder where the .csv's are looks like - read everything in it or just files following a naming pattern, how often do you need to run the process, is this about a fully productionworthy implementation, etc. ).

 

One way to go would be to first create a listing of .csv names (user written code), load these names into a temporary SAS Work table and then use this work tables as control table for a loop transformation. Inside the loop transformation (the inner job) is external table metadata (this time with a macro variable in double quotes as file name - the parameter passed to the inner job, a file reader and an append node which collects all the data read via the iterations through the inner job). 

 

If you've got header rows then there is eventually also a "hack" option which still would allow you to use the simple wildcard approach - but it's something I would first need to test myself before I can propose it.

 

I suggest that as a first step you implement a DIS job which reads a single .csv into your target table and we go from there.

 

 

View solution in original post

2 REPLIES 2
Patrick
Opal | Level 21

To use DIS efficiently you need to get to an at least intermediate SAS coding level (Base SAS, SQL and macro language).

 

Without the coding skills debugging (reading logs) will be difficult and you'll also be challenged to implement not only logically correct but also performant processes.

 

As for your question:

If your .csv's all have the same structure and don't have a header row with column names then you could use a wildcard: Detail_*.csv

Create a job with an external file metadata object for one of the .csv's, a File Reader and a target SAS table and test if this works. Once tested change the name of the external file in the external file metadata to the asterix syntax (the wildcard name).

 

Should your .csv's have a header row then things become a bit more involved and you need to give as as much detail as you can (i.e. how the folder where the .csv's are looks like - read everything in it or just files following a naming pattern, how often do you need to run the process, is this about a fully productionworthy implementation, etc. ).

 

One way to go would be to first create a listing of .csv names (user written code), load these names into a temporary SAS Work table and then use this work tables as control table for a loop transformation. Inside the loop transformation (the inner job) is external table metadata (this time with a macro variable in double quotes as file name - the parameter passed to the inner job, a file reader and an append node which collects all the data read via the iterations through the inner job). 

 

If you've got header rows then there is eventually also a "hack" option which still would allow you to use the simple wildcard approach - but it's something I would first need to test myself before I can propose it.

 

I suggest that as a first step you implement a DIS job which reads a single .csv into your target table and we go from there.

 

 

LinusH
Tourmaline | Level 20
Short version of @Patrick answer: your second option is definitely the prime choice.
An alternative if you want to avoid the complexity of inner/outer job loops is to use an aggregate file location. This would be simpler DIS wise but makes a bit harder to add log/audit functionality.
Either way, you may to take into account functionality to not read a file twice (look up audit table or physically move the file after reading).
Data never sleeps

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to connect to databases in SAS Viya

Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 2849 views
  • 2 likes
  • 3 in conversation