I have a series of similar infile macros for various CSV file directories. They all follow the same pattern. I create a table of file names that I pass to a macro variable using proc sql. I then run the infile macro to loop through the directory and ingest the data. The macro looks like this:
%macro ingest; data ingest_data; length field1 $50 field2 $50 field3 $50 ; %do i = 1 %to &num_obs; infile "&&file&i" dlm = ',' dsd firstobs=2 missover lrecl=32767 encoding="wlatin1"; input field1 :$ field2 :$ field3 :$ ; %end; run; %mend;
The latest program I'm working on is doing something odd and I'm not sure why. I think I know why, but not sure the cause.
When I ingest the data from a new directory, the record count of the first file is being set for all subsequent files, even if those files have more records. So all CSV files are ingested, but its only ingesting, for example, 1700 records per file instead of how ever many records are in the file.
My &num_obs is only calculating total file counts. Is the system obs variable being set somehow? I havent encountered this in any of the previous macros I've put together. Would appreciate any insight. Thanks.
That macro does not look like it will generate valid SAS code.
You are generating a data step that looks like:
data want;
  infile one ;
  infile two;
  ...
  infile last;
  input;
run;If you want to read from multiple files you need to either run multiple data steps.
Or put the INPUT (and OUTPUT) statements inside of a loop so you can read all of the lines from the file in one pass of the data step.
But in that case you no longer need the %DO loop so there is no need to bother creating a macro.
* Assume there exist macro variables named NUM_OBS and FILE1 ... FILE&num_obs ;
data ingest_data;
  length filename $300 ;
  length field1-field3 $50 ;
  do i=1 to &num_obs;
    filename = symget(cats('file',i));
    infile csv filevar=filename dsd firstobs=2 truncover encoding="wlatin1" end=eof;
    do while (not eof);
      input field1-field3;
      output;
    end;
  end;
  drop i;
  stop;
run;And you probably do not need the macro variables either since I assume you created them from a dataset. So just use the dataset to drive the loop. Say you have a dataset named LIST with a variable named FILENAME.
* Use a dataset to secify the list of files ;
data ingest_data;
  set list;
  length field1-field3 $50 ;
  infile csv filevar=filename dsd firstobs=2 truncover encoding="wlatin1" end=eof;
  do while (not eof);
    input field1-field3;
    output;
  end;
run;Another method is to use the %DO loop to generate a FILENAME statement that creates a FILEREF that points to all of the files.
filename csv (
%do i = 1 %to &num_obs; 
  "&&file&i" 
%end;
);
Then you can use that fileref in your data step. But this time use the FILENAME= option to have SAS populate a data step variable that indicates the current file being read so you can detect the and skip the header lines.
data ingest_data;
  length filename $300 ;
  length field1-field3 $50 ;
  infile csv filename=filename dsd truncover encoding="wlatin1" end=eof;
  input @;
  if filename ne lag(filename) then delete ;
  input field1-field3;
run;
That macro does not look like it will generate valid SAS code.
You are generating a data step that looks like:
data want;
  infile one ;
  infile two;
  ...
  infile last;
  input;
run;If you want to read from multiple files you need to either run multiple data steps.
Or put the INPUT (and OUTPUT) statements inside of a loop so you can read all of the lines from the file in one pass of the data step.
But in that case you no longer need the %DO loop so there is no need to bother creating a macro.
* Assume there exist macro variables named NUM_OBS and FILE1 ... FILE&num_obs ;
data ingest_data;
  length filename $300 ;
  length field1-field3 $50 ;
  do i=1 to &num_obs;
    filename = symget(cats('file',i));
    infile csv filevar=filename dsd firstobs=2 truncover encoding="wlatin1" end=eof;
    do while (not eof);
      input field1-field3;
      output;
    end;
  end;
  drop i;
  stop;
run;And you probably do not need the macro variables either since I assume you created them from a dataset. So just use the dataset to drive the loop. Say you have a dataset named LIST with a variable named FILENAME.
* Use a dataset to secify the list of files ;
data ingest_data;
  set list;
  length field1-field3 $50 ;
  infile csv filevar=filename dsd firstobs=2 truncover encoding="wlatin1" end=eof;
  do while (not eof);
    input field1-field3;
    output;
  end;
run;Another method is to use the %DO loop to generate a FILENAME statement that creates a FILEREF that points to all of the files.
filename csv (
%do i = 1 %to &num_obs; 
  "&&file&i" 
%end;
);
Then you can use that fileref in your data step. But this time use the FILENAME= option to have SAS populate a data step variable that indicates the current file being read so you can detect the and skip the header lines.
data ingest_data;
  length filename $300 ;
  length field1-field3 $50 ;
  infile csv filename=filename dsd truncover encoding="wlatin1" end=eof;
  input @;
  if filename ne lag(filename) then delete ;
  input field1-field3;
run;
Makes sense, thanks for the feedback.
Your macro creates this:
data ...;
length ...:
infile file1;
input ...;
infile file2;
input ...;
infile file3;
input ...;
/* and so on */
run;So, in every iteration of the DATA step, a record is read from each file; as soon as one of the INPUTs tries to read past EOF, the step terminates, regardless of records still present in other files. The shortest file determines the number of records read.
See example 5 in the documentation of the INFILE Statement for how to read multiple files in one step.
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.
