BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
Ody
Quartz | Level 8 Ody
Quartz | Level 8

I have a series of similar infile macros for various CSV file directories. They all follow the same pattern. I create a table of file names that I pass to a macro variable using proc sql. I then run the infile macro to loop through the directory and ingest the data. The macro looks like this:

 

%macro ingest;
data ingest_data;
length 
field1 $50
field2 $50
field3 $50
;

%do i = 1 %to &num_obs; 
infile "&&file&i" dlm = ',' dsd firstobs=2 missover lrecl=32767 encoding="wlatin1";
input
field1 :$
field2 :$
field3 :$
;
%end;
run;
%mend;

The latest program I'm working on is doing something odd and I'm not sure why. I think I know why, but not sure the cause.

 

When I ingest the data from a new directory, the record count of the first file is being set for all subsequent files, even if those files have more records. So all CSV files are ingested, but its only ingesting, for example, 1700 records per file instead of how ever many records are in the file.

 

My &num_obs is only calculating total file counts. Is the system obs variable being set somehow? I havent encountered this in any of the previous macros I've put together. Would appreciate any insight. Thanks.  

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

That macro does not look like it will generate valid SAS code.

 

You are generating a data step that looks like:

data want;
  infile one ;
  infile two;
  ...
  infile last;
  input;
run;

If you want to read from multiple files you need to either run multiple data steps. 

 

Or put the INPUT (and OUTPUT) statements inside of a loop so you can read all of the lines from the file in one pass of the data step.

 

But in that case you no longer need the %DO loop so there is no need to bother creating a macro.

* Assume there exist macro variables named NUM_OBS and FILE1 ... FILE&num_obs ;
data ingest_data;
  length filename $300 ;
  length field1-field3 $50 ;
  do i=1 to &num_obs;
    filename = symget(cats('file',i));
    infile csv filevar=filename dsd firstobs=2 truncover encoding="wlatin1" end=eof;
    do while (not eof);
      input field1-field3;
      output;
    end;
  end;
  drop i;
  stop;
run;

And you probably do not need the macro variables either since I assume you created them from a dataset.  So just use the dataset to drive the loop.  Say you have a dataset named LIST with a variable named FILENAME.  

* Use a dataset to secify the list of files ;
data ingest_data;
  set list;
  length field1-field3 $50 ;
  infile csv filevar=filename dsd firstobs=2 truncover encoding="wlatin1" end=eof;
  do while (not eof);
    input field1-field3;
    output;
  end;
run;

Another method is to use the %DO loop to generate a FILENAME statement that creates a FILEREF that points to all of the files.

filename csv (
%do i = 1 %to &num_obs; 
  "&&file&i" 
%end;
);

Then you can use that fileref in your data step.  But this time use the FILENAME= option to have SAS populate a data step variable that indicates the current file being read so you can detect the and skip the header lines.

data ingest_data;
  length filename $300 ;
  length field1-field3 $50 ;
  infile csv filename=filename dsd truncover encoding="wlatin1" end=eof;
  input @;
  if filename ne lag(filename) then delete ;
  input field1-field3;
run;

 

View solution in original post

4 REPLIES 4
Tom
Super User Tom
Super User

That macro does not look like it will generate valid SAS code.

 

You are generating a data step that looks like:

data want;
  infile one ;
  infile two;
  ...
  infile last;
  input;
run;

If you want to read from multiple files you need to either run multiple data steps. 

 

Or put the INPUT (and OUTPUT) statements inside of a loop so you can read all of the lines from the file in one pass of the data step.

 

But in that case you no longer need the %DO loop so there is no need to bother creating a macro.

* Assume there exist macro variables named NUM_OBS and FILE1 ... FILE&num_obs ;
data ingest_data;
  length filename $300 ;
  length field1-field3 $50 ;
  do i=1 to &num_obs;
    filename = symget(cats('file',i));
    infile csv filevar=filename dsd firstobs=2 truncover encoding="wlatin1" end=eof;
    do while (not eof);
      input field1-field3;
      output;
    end;
  end;
  drop i;
  stop;
run;

And you probably do not need the macro variables either since I assume you created them from a dataset.  So just use the dataset to drive the loop.  Say you have a dataset named LIST with a variable named FILENAME.  

* Use a dataset to secify the list of files ;
data ingest_data;
  set list;
  length field1-field3 $50 ;
  infile csv filevar=filename dsd firstobs=2 truncover encoding="wlatin1" end=eof;
  do while (not eof);
    input field1-field3;
    output;
  end;
run;

Another method is to use the %DO loop to generate a FILENAME statement that creates a FILEREF that points to all of the files.

filename csv (
%do i = 1 %to &num_obs; 
  "&&file&i" 
%end;
);

Then you can use that fileref in your data step.  But this time use the FILENAME= option to have SAS populate a data step variable that indicates the current file being read so you can detect the and skip the header lines.

data ingest_data;
  length filename $300 ;
  length field1-field3 $50 ;
  infile csv filename=filename dsd truncover encoding="wlatin1" end=eof;
  input @;
  if filename ne lag(filename) then delete ;
  input field1-field3;
run;

 

Ody
Quartz | Level 8 Ody
Quartz | Level 8

Makes sense, thanks for the feedback.

Kurt_Bremser
Super User

Your macro creates this:

data ...;
length ...:
infile file1;
input ...;
infile file2;
input ...;
infile file3;
input ...;
/* and so on */
run;

So, in every iteration of the DATA step, a record is read from each file; as soon as one of the INPUTs tries to read past EOF, the step terminates, regardless of records still present in other files. The shortest file determines the number of records read.

 

See example 5 in the documentation of the INFILE Statement for how to read multiple files in one step.

Ody
Quartz | Level 8 Ody
Quartz | Level 8
Thanks

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 414 views
  • 6 likes
  • 3 in conversation