BookmarkSubscribeRSS Feed
Ody
Quartz | Level 8 Ody
Quartz | Level 8

Hello all,

Thanks in advance for your thoughts.

I have a directory with several hundred files I need to add to a data set, DataSetX. The csv file name is always the same length/name. The only difference is the last seven digits preceding the file extension are the day of the file, _MMDDYY. For example, myfile_111713.csv.

My first attempt at this, as I'm fairly green to programming, was to assign the day of the file to a variable, something like %let=MMDDYY, and manually change the date every time I add a new file to the data set. I know there's a more efficient, less time consuming way of doing this. So my questions are this: How can I modify my infile statement to loop through all the files in the directory? Since more files will be added to this directory in batches that need to be added to the data set, how do I ensure the code will only add new files?

I greatly appreciate any advice/help!

5 REPLIES 5
data_null__
Jade | Level 19

It sounds like your CSVs all have the same structure, same fields in same order, if so you could use a fileref with a wild-card reference
filename mycsv 'path/root_*.csv';  Then you can read every file with ONE data step INFILE statement.  You will want to consult the online documention for the specifics and also perhaps you will need INFILE statement options EOV and FILENAME.

Ody
Quartz | Level 8 Ody
Quartz | Level 8

Thanks for the information. 

I did not know I could read all the files in at once. I will most certainly look into that! I think that could be helpful in certain situations but for my particular scenario I don't think that will work.

Although the data in the files is standardized (i.e. same structure, same fields in same order, etc) the dates for these records are only contained in the file name. The code I've written was taking my manually updated %let variable and adding that as a field to the data; it is the only way to tell what day the records were processed.

data_null__
Jade | Level 19

The FILENAME INFILE statement option contains the file name of the file that is currently being read.  You can get the information you need there.  The EOV option tells you when a new file begins.  Look at the help and see if you can figure it.  Also search for EOV in this forum and you should find example.

https://communities.sas.com/message/154344#154344

Reeza
Super User

Basically,SAS creates a variable that contains the filename and you can parse that out instead of manually creating it.

This is what I used, but I can't remember if it worked correctly so double check it Smiley Happy


As for reading new files that are not in your list, if they follow a standard you can use a macro solution to generate the file names (ie generate dates for last month) or you can read the list of files using a pipe command and compare to the data in the dataset already. Then files that aren't read can be read and appended using your original macro solution.



data try01;

length filename txt_file_name $256;

retain txt_file_name;

infile "Path\*.txt" eov=eov filename=filename truncover;

 

if _n_ eq 1 or eov then do;

txt_file_name = scan(filename, -2, ".\");

eov=0;

end;

input

*Place input code here;

run;

Ody
Quartz | Level 8 Ody
Quartz | Level 8

Thanks for pointing me in the right direction. I'll do some research and let you know how it goes. Thanks again!

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 2365 views
  • 0 likes
  • 3 in conversation