DATA Step, Macro, Functions and more

help with INFILE/INPUT problem when reading multiple text files with same format

Reply
Frequent Contributor
Posts: 91

help with INFILE/INPUT problem when reading multiple text files with same format

SAS folks-

I've been playing around with this for quite awhile, and need some help. A colleague has lots of txt files in a directory, and is trying to read them with SAS.  The filename includes the date.

Unfortunately, they have a little twist that I have so far been unable to unravel - the data we want starts on line 7, is multiple (unknown) records per line, and -9999 is null data and does not need to be read.

I have no problem inputting one file in the infile statement.  This reads the multiple values per line, kicks out the null data, and I am able to pull out the date from the filename.

DATA STRANGEDATA ;

  LENGTH MYINFILE  $ 99 ;

    INFILE 'X:\ES\EA\Wildlife\2000\abc20000601.txt' FILENAME=MYINFILE FIRSTOBS=7 LRECL=35000 PAD ; 

      INPUT DEPTH @@ ;

       IF DEPTH > 0 ;

         SAVEFILE=MYINFILE ;

         Y=COMPRESS(SAVEFILE,,'DK') ;

         SASDATE=MDY(SUBSTR(Y,9,2), SUBSTR(Y,11,2), SUBSTR(Y,5,4)) ;

         FORMAT SASDATE DATE9. ;

      OUTPUT ;

RUN;

However, when I change the filename to include a wildcard, I can't stop it reading the first 6 lines of subsequent files.  I've been fooling around with EOV= but can't seem to make it work.

INFILE 'X:\ES\EA\Wildlife\2000\abc*.txt' FILENAME=MYINFILE FIRSTOBS=7 LRECL=35000 PAD ;

So, I went at it from another method - making a file with the filenames, and reading the files in turn.

This successfully picks up all the files in the list, but only the first entry on a line. 

x 'del index.lst';

x 'dir /b X:\ES\EA\Wildlife\2000\abc*.txt > index.lst';

x 'exit' ;

DATA STRANGEDATA ;

  LENGTH MYINFILE $99 ;

    INFILE 'index.lst' PAD MISSOVER;

      INPUT MYINFILE $ 1-30;

       FULLFILE='X:\ES\EA\Wildlife\2000\'||MYINFILE ;

    INFILE FILE FILEVAR=FULLFILE END=DONE PAD MISSOVER FIRSTOBS=7 LRECL=35000;

       DO UNTIL (DONE);

         INPUT DEPTH ;

          IF DEPTH > 0 ;

           Y=COMPRESS(MYINFILE, , 'DK');

           SASDATE=MDY(SUBSTR(Y,5,2), SUBSTR(Y,7,2), SUBSTR(Y,1,4)) ;

           FORMAT SASDATE DATE9. ;

         OUTPUT;

       END;

RUN ;

Adding  @@ to the INPUT and deleting MISSOVER from INFILE to pick up the multiple occurrences per line gets nothing, and I know there should be observations from running the files separately.

I have to be missing something obvious here - any help you can give me would be much appreciated.  Three of the many files are attached here.

Thanks for any help you can give me! 

We are running 9.2 on Windows.

WendyT

Attachment
Attachment
Attachment
Contributor
Posts: 43

help with INFILE/INPUT problem when reading multiple text files with same format

Just a guess, but try adding "THEN DO" to the "IF DEPTH > 0" statement (plue a matching END statement.  A subsetting IF statement might not work in this context.

PROC Star
Posts: 7,363

help with INFILE/INPUT problem when reading multiple text files with same format

It would help if you posted what the resulting file "should" look like.  However, that said, I think you are making the task more complicated than it has to be.  Additionally, I doubt if the locations of month, day and year are where your code is looking.

Possibly, and only possibly, the following might do the entire task:

%let dir=c:\art\test\;

DATA STRANGEDATA ;

  LENGTH MYINFILE  $ 99 ;

  INFILE "&dir.*.txt" FILENAME=MYINFILE FIRSTOBS=7 LRECL=35000 PAD ;

  INPUT DEPTH @@ ;

  IF DEPTH > 0 ;

  SAVEFILE=MYINFILE ;

  Y=COMPRESS(SAVEFILE,,'DK') ;

  SASDATE=MDY(SUBSTR(Y,5,2), SUBSTR(Y,7,2), SUBSTR(Y,1,4)) ;

  FORMAT SASDATE DATE9. ;

  OUTPUT ;

RUN;

Super Contributor
Posts: 339

Re: help with INFILE/INPUT problem when reading multiple text files with same format

I'm not 100% sure it works with the INFILE statements but I believe the reason why your original approach reads the 7th obs only for the first file opened is because it is meant to do so. At least so it is for dataset options. You could try using the system option FIRSTOBS= for it to apply to each set. Depending on how SAS processes INFILE statements with a wild card, it may produce the expected results

options firstobs=7;

DATA STRANGEDATA ;

  LENGTH MYINFILE  $ 99 ;

    INFILE 'X:\ES\EA\Wildlife\2000\abc*.txt' FILENAME=MYINFILE LRECL=35000 PAD ; 

      INPUT DEPTH @@ ;

       IF DEPTH > 0 ;

         SAVEFILE=MYINFILE ;

         Y=COMPRESS(SAVEFILE,,'DK') ;

         SASDATE=MDY(SUBSTR(Y,9,2), SUBSTR(Y,11,2), SUBSTR(Y,5,4)) ;

         FORMAT SASDATE DATE9. ;

      OUTPUT ;

RUN;

It may or may not work but takes a minute to test. Otherwise, I can't think of any other way than to use macro to loop on all files in a root folder, do a scan condition on the name so that they are each of the form abc*.txt and then use that do build a %do %end loop to emulate writing every single infile statement each with the firstobs=7 statement.

I'm affraid however that SAS considers an infile statement with a wild card as a single file and thus the use of system options won't work.

Vincent

Super User
Super User
Posts: 6,502

Re: help with INFILE/INPUT problem when reading multiple text files with same format

There is a trick to using the EOV variable.  It does not get set until the first line of the next file is read.

Try this little experiment to see what is happening and how to generate a LINE # variable that you can use to skip the headers.

data _null_;

  do file='one','two','three';

    filename =cats("&path/",file,'.txt');

    file dummy filevar=filename;

    do i=1 to 10;

     put i file;

   end;

  end;

stop;

run;

data _null_;

  infile "&path/*.txt" end=eof eov=eov ;

  input @;

  if eov then line=1 ;

  else line+1;

  eov=0;

  if line <= 7 then delete ;

  put (_n_ line) (= z3.) '| ' _infile_ ;

run;

Ask a Question
Discussion stats
  • 4 replies
  • 306 views
  • 0 likes
  • 5 in conversation