12-08-2013 10:19 AM
Thanks in advance for your thoughts.
I have a directory with several hundred files I need to add to a data set, DataSetX. The csv file name is always the same length/name. The only difference is the last seven digits preceding the file extension are the day of the file, _MMDDYY. For example, myfile_111713.csv.
My first attempt at this, as I'm fairly green to programming, was to assign the day of the file to a variable, something like %let=MMDDYY, and manually change the date every time I add a new file to the data set. I know there's a more efficient, less time consuming way of doing this. So my questions are this: How can I modify my infile statement to loop through all the files in the directory? Since more files will be added to this directory in batches that need to be added to the data set, how do I ensure the code will only add new files?
I greatly appreciate any advice/help!
12-08-2013 12:04 PM
It sounds like your CSVs all have the same structure, same fields in same order, if so you could use a fileref with a wild-card reference
filename mycsv 'path/root_*.csv'; Then you can read every file with ONE data step INFILE statement. You will want to consult the online documention for the specifics and also perhaps you will need INFILE statement options EOV and FILENAME.
12-10-2013 08:16 AM
Thanks for the information.
I did not know I could read all the files in at once. I will most certainly look into that! I think that could be helpful in certain situations but for my particular scenario I don't think that will work.
Although the data in the files is standardized (i.e. same structure, same fields in same order, etc) the dates for these records are only contained in the file name. The code I've written was taking my manually updated %let variable and adding that as a field to the data; it is the only way to tell what day the records were processed.
12-10-2013 08:31 AM
The FILENAME INFILE statement option contains the file name of the file that is currently being read. You can get the information you need there. The EOV option tells you when a new file begins. Look at the help and see if you can figure it. Also search for EOV in this forum and you should find example.
12-10-2013 10:47 AM
Basically,SAS creates a variable that contains the filename and you can parse that out instead of manually creating it.
This is what I used, but I can't remember if it worked correctly so double check it
As for reading new files that are not in your list, if they follow a standard you can use a macro solution to generate the file names (ie generate dates for last month) or you can read the list of files using a pipe command and compare to the data in the dataset already. Then files that aren't read can be read and appended using your original macro solution.
length filename txt_file_name $256;
infile "Path\*.txt" eov=eov filename=filename truncover;
if _n_ eq 1 or eov then do;
txt_file_name = scan(filename, -2, ".\");
*Place input code here;