BookmarkSubscribeRSS Feed
kmardinian
Quartz | Level 8

Hi,

 

I am trying to pull multiple files from an output folder where each are in this format; SUBJECTXXX_datetime.csv. So each folder has a different Subject ID and a different date stamp. There can multiple files in this one folder.

 

I'd like to pull all the files that are in this folder into SAS and merge them into one dataset, but I am unsure how to do this without calling each file individually. Once those files are in that one dataset, I'd like to move them to another folder, to then run my program again for the next batch of files

 

I appreciate any help, thank you!

 

27 REPLIES 27
Reeza
Super User
You can read all files at once or read each folder at a time. First you need to create a list of the files and a working program to read one file.
Do you have either of those? Once you do, making it happen for every file is a pretty trivial task 🙂
kmardinian
Quartz | Level 8

Hi Reeza,

 

Thank you!

 

I do not have that in my program yet, I was not sure how to call all the files without having to name them all individually.

Is there a good way of doing that?

 

I had found this online:

   %include "&dir\&name" /source2;
   Filename filelist pipe "dir /b /s c:\temp\*.sas"; 
                                                                                   
   Data _null_;                                        
     Infile filelist truncover;
     Input filename $100.;
     Put filename=;
   Run; 

Would that work well for what I need to do?

 

And then I can merge them without calling them separately as well?

Reeza
Super User

@kmardinian wrote:

Hi Reeza,

 

Thank you!

 

I do not have that in my program yet, I was not sure how to call all the files without having to name them all individually.

Is there a good way of doing that?

 

I had found this online:

   %include "&dir\&name" /source2;
   Filename filelist pipe "dir /b /s c:\temp\*.sas"; 
                                                                                   
   Data _null_;                                        
     Infile filelist truncover;
     Input filename $100.;
     Put filename=;
   Run; 

Would that work well for what I need to do?

 

And then I can merge them without calling them separately as well?


I would modify it to save it to a data set - check that it lists all the files you need and not the directories. 

You may need to filter it if you have other files in the same folder. 

 

The reason I'm not suggesting wildcards, as @Tom  is, is because you have files in multiple folders and I don't think wildcards can handle that without multiple calls. And I'm assuming by 'merging' you mean append and add in the subject name as a field rather than merge to a wide data set because that doesn't makes sense to me. You can use his approach and repeat it several times or you can find all the file names and read them into a single data set in one step. 

 

The infile documentation has some examples but once you provide the code to read a single file we can help you convert it to the code you need to read all files. 

 

Check that this gives you the file list with all the files you need to read.

 

   Filename filelist pipe "dir /b /s c:\temp\*.csv"; 
                                                                                   
   Data myFileList;                                        
     Infile filelist truncover;
     Input filename $256.;
   Run; 

 

kmardinian
Quartz | Level 8

I was able to read in each file with this:

 

%let path1=C:\Users\K.M\Desktop\Reads\;


libname path1 "&path1";

 

%let dir=&path1;


filename xlfile pipe "dir ""&dir"" /tc /a-d" lrecl=200;

data in;
infile xlfile length=len end = eof;
input char $varying400. len;
if index(char,'.csv');
filename=scan(char,5,' ');
if filename =:'SUBJECT-';
run;

 

So now I have an output dataset with the variables char and filename. So now I have to figure out how to use the variable filename to call each file so I can merge them into one dataset. Is there a good way of doing this?

Reeza
Super User
Do you have the code to read a single file? I'll help you put them together once you post that code.
Tom
Super User Tom
Super User

So if you have a dataset named IN with a variable named CHAR that has the name of the file to read then you can use that to drive reading all of the files in one step.

data want ;
  set in ;
  fname=char ;
  infile csv filevar=fname dsd firstobs=2 truncover end=eof;
  do while (not eof);
    ... code to read the data from a line ...
    output;
  end;
run;
kmardinian
Quartz | Level 8

Hi Tom,

 

I don't quite understand how this code will import all of the individual files one by one?

 

And what do you mean by this line "

... code to read the data from a line ... 

 

Thank you,

Kristina

Reeza
Super User
data want ;
  set in ; *data set with list of file names and paths to read in;

 *store filename so you can get information later;
  fname=char ;


  infile csv filevar=fname dsd firstobs=2 truncover end=eof; 
*note fname is used here as filevar, which controls which file is read; *note eof which says that its the end of a file and avoids reading first line of new file; do while (not eof); INPUT tested input data statement to read your file; output; end; run;
kmardinian
Quartz | Level 8

I'm sorry for not understanding this clearly. But the do loop only needs to include an input statement listing all the variables in the files?

Reeza
Super User

@kmardinian wrote:

I'm sorry for not understanding this clearly. But the do loop only needs to include an input statement listing all the variables in the files?


You should have the Length or Format statements as indicated by Tom, but it's why I've said from the start you need two things to automate this: 

 

1. File list -> now solved

2. Working code to read one file, via a data step -> outstanding

 

The code to read the single file, whether it includes INFORMATS, FORMATS, LENGTH statements, must include an INPUT statement. 

 

So within that do loop you place the code you've tested as working to read the file. I would start by limiting your input data set and testing it, add the obs=4 to the SET statement to test it out on a small set of files first. Once it's working, remove that option.

 

set in (obs=4);
Tom
Super User Tom
Super User

The main data step will read in the dataset with the file names one observation at a time.

It then tells the INFILE statement which variable contains the NAME of the file to read.  It also tells the INFILE statement to skip the first line of the file (assuming your files have a header row) and set the variable EOF to be true when it has read the last line from that file.

 

The DO loop will then let you read each line from the file.  You will need to have a explicit OUTPUT statement since each iteration of the data step will read an entire file and not just one observation.

 

I left out the part about how to read the data from the files since you did not supply the information needed to do that.  In general it is trivial to write code to read from a delimited file. First define the variable (easiest if you define them in the order they appear in the file). The assign any NEEDED informats or WANTED formats using INFORMAT and/or FORMAT statement. Then read the line using an INPUT statement.  Simple example:

length id $20 age 8 dob 8 class $20 ;
informat dob date.;
format dob date9.;
input id -- class;
kmardinian
Quartz | Level 8

Hi Tom,

 

You mentioned like Reeza that I have failed to supply the information needed to read in the data. What information do you need for that exactly? A proc import statement or an infile statement? Isn't that already in my line of code?

 

Thank you!

 

Tom
Super User Tom
Super User

@kmardinian wrote:

Hi Tom,

 

You mentioned like Reeza that I have failed to supply the information needed to read in the data. What information do you need for that exactly? A proc import statement or an infile statement? Isn't that already in my line of code?

 

Thank you!

 


You don't want to use PROC IMPORT to read multiple delimited files. It will guess at how to define the variables based on the limited sample of data that appears in the one file it is currently processing. So you will end up with incompatible datasets.  Plus you will need to run a separate PROC step for each individual file.

 

You did not supply any INPUT statements (or other code) to read ANY of the files.  You did supply code to get the LIST of files, so that is why I said start with the dataset that has the list of files.

 

Here is an example of a program that reads delimited data. I will use in-line data using | as the delimiter to make the example clearer.

data want;
  infile cards dsd dlm='|' truncover firstobs=2;
  length name $20 age 8 dob 8 class $20 ;
  informat dob date.;
  format dob date9.;
  input name -- class;
cards;
Name|Age|DOB|Class
Fred|20|01JAN1999|Computer Science
Jill|23|02-JAN-1997|Mathematics
;

If you have many files all in the same format then you should have somewhere a document that describes the order of the fields and definition of each field.  Use that to help you write the code. If you don't have such a document then try running PROC IMPORT on just one of the files and look at the code it generates. It will be ugly with many unneeded (unwanted) formats and informats attached to the variables, but it should let you have a starting point. You will also want to perhaps pad the lengths of any character variables in case future files have longer values.

kmardinian
Quartz | Level 8

Is there anyway that I can just ask SAS to spit out all the variables without listing their individual lengths and formats. I have over 50 variables in each of these files, so I was able to copy and paste them all into the input statement but writing out each length and format would be super tedious.

 

Also, it does not seem to be recognizing my file path for some reason.

 

55
56 data want ;
57 set in ;
58 fname=filename ;
59 infile xlfile filevar=fname dsd firstobs=2 truncover end=eof;
60 do while (not eof);
61
62 input studyID study Date readID  --------------------------------------------------------------------- ;
63
64 output;
65 end;
66 run;

ERROR: Physical file does not exist, C:\SAS\Config\Lev1\SASApp\STAR-SUBJECT07.csv.
char=07/22/2019 01:06 PM 2,116 STAR-SUBJECT07.csv filename=STAR-SUBJECT07.csv num=1 fname=STAR-SUBJECT07.csv
eof=0 studyID=. study=. Date=. readID=. _ERROR_=1 _N_=1
NOTE: The SAS System stopped processing this step because of errors.
NOTE: There were 1 observations read from the data set WORK.IN.
WARNING: The data set WORK.WANT may be incomplete. When this step was stopped there were 0 observations and 7 variables.
WARNING: Data set WORK.WANT was not replaced because this step was stopped.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 27 replies
  • 1178 views
  • 4 likes
  • 5 in conversation