Desktop productivity for business analysts and programmers

Loop-load multiple flat files and validate each one

Reply
Occasional Contributor
Posts: 5

Loop-load multiple flat files and validate each one

[ Edited ]

Hi there

 

I'm trying to solve a problem which I can do easily in other languages but am stuck doing in SAS. Using Enterprise Guide 5.1 (64-bit) on a Windows server. The basic setup is as follows;

 

I get a manually generated output from a system which stores per-day files in a folder structure on a network drive, in a flat text file with structured widths. I need to infile each of these. There is an existing procedure to do this, which works fine, except for the fact it does not validate it is receiving a non-corrupted file, and cheerfully uploads garbage into a SAS table. Obviously the log file for dealing with 30 files in one read is so long it's barely worth reading.

 

And if an error isn't picked up then the source system doesn't keep data forever, so invalid files can't be re-created....

 

The basic file structure would be as follows, let's say each file is supposed to be 500 characters wide;

 

 FIELD FIELD                                   FINAL

  A      B    C  LONG FIELD D                  FIELD

 ----- ----- --- ---------------------   ..... =====

   ONE   AND   2 MAKEFOURBUTNOTFIVE                1

   ONE   AND   3 MAKEFOURBUTNOTFIVE                0

   TWO   AND   2 MAKEFOURBUTNOTFIVE           1

   TWO   AND   2 MAKEFOURBUTNOTFIVE                1

   TWO   AND   2 MAKEFOURB

   TWO   AND   2 MAKEFOURBUTNOTFIVE                1

                 ==================

                                  6

 

So as you can see lines #3 and #5 are evidence of a corrupted/incorrectly generated file.

#3 I want to gracefully deal with, #5 I want to quit out the whole process, and not process any of the other files at all. Red lights and sirens all round.

The final line is a summary of how many records were output by the source system. If SAS hasn't read in the same number of Obs as the file says it supplied, then again, red lights and sirens. I am assuming that if I want to do this then I need to be able to "see" two lines in an INPUT statement, and therefore the INFILE should have n=2 as an option... correct?

 

 

What I need to do is;

if Length(CurrentLine) = 500 then

  Input as normal

elseif length(CurrentLine) = 494 then

    pickup the LAST FIELD from the incorrect position but input the rest of the line as normal.

else

    scream a warning in ten foot letters and stop everything

end

 

If LONGFIELD = "==================" then

   Check the number of read-in obs against the entry in the next line.

   if ReadInObs <> ChecksumObs then

    scream a warning in ten foot letters and stop everything

end

 

 

On another note, I am attempting to write a loop to go through the day/month files, rather than the very long piece of "name each day file, generate a per-day table, add the per-day table to the master table" process. But it doesn't seem to want to take a variable filename...

 

%Let MonthA = Oct17;
%put &MonthA;
%Let MonthB = 1017;
%put &MonthB;

Data myLib.MasterFile_Day;
do i = 1 to 31;
	fname = "/Test/&MonthA/FixedpartofFileName " || put( i, z2.) || "&MonthB..txt";
	tmpname = "MasterFile_" || put( i, z2.);
	put fname;
	put tmpname;
	infile fname Length=LineLen line=currLine col=currColumn linesize=500  n=2 recfm = v lrecl = 32000 firstobs = 8 truncover;
	%Filereadin /* Macro for doing the input statement */

end;
run;

Running this code gets me "ERROR: No logical assign for filename FNAME."

 

Sorry, can't upload samples as it's sensitive data.

Super User
Posts: 8,054

Re: Loop-load multiple flat files and validate each one

SAS will set _error_ during data step execution if the input statement encounters an error.

So you might do

%let inerr=0;
data table;
infile in;
input



;
if _error_ then call symput('inerr','1');
run;

so can later check &inerr and take appropriate action.

 

Or you run a

grep ERROR my.log

(insert your logfile name)

in your sasbatch script, and if grep succeeds, throw an error that the scheduler catches.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
PROC Star
Posts: 1,177

Re: Loop-load multiple flat files and validate each one

[ Edited ]

On your first query, here's a program that handles some of your issues. I can't really tell what your requirements are for processing records over two lines, so I can't do anything about that for now. Note that the first two steps just create a flat file in the format you're working with, to give the third program something to work on.

 

Tom

 

%let FileDir =C:\SomDir;

data have;
	length longrec $1000;
	input;
	longrec = _infile_;
	cards;
ONEAND2MAKEFOURBUTNOTFIVE    1
ONEAND3MAKEFOURBUTNOTFIVE    0
TWOAND2MAKEFOURBUTNOTFIVE 1
TWOAND2MAKEFOURBUTNOTFIVE    1
TWOAND2MAKEFOURB
TWOAND2MAKEFOURBUTNOTFIVE    1
==================
6
run;

data _null_;
	file "&FileDir.\testfile.txt" lrecl=32767;
	set have;
	put longrec;
run;

data Check1(keep=InputRecord) ErrorMessages(keep=ErrorMessage);
	length InputRecord $32767 ErrorMessage $256;
	retain EndSwitch 0 InputRecordCount 0;
	infile "&FileDir.\testfile.txt" lrecl=32767;
	input;

	/* This test can be adjusted, based on how many equal signs you're expecting, where they are, if it's variable, etc. */
	if _infile_ = "==================" then EndSwitch = 1; /* This line is the equal sign line */
	else if EndSwitch = 1 then EndSwitch = 2; /* This line is the line after the equal sign line, namely the record count line */

	if EndSwitch = 0 
		then do;

		/* This is a normal line */
		InputRecordCount = InputRecordCount + 1;
		InputRecordLength = length(_infile_);

		/* Do the stuff needed to parse the record into fields here, maybe using the "scan" function */
		InputRecord = _infile_;
		output Check1;

		if InputRecordLength ^= 30
			then do;
			ErrorMessage = "Record number " || strip(put(_n_, best15.)) || " is the wrong length, " || strip(put(InputRecordLength, best15.));

			/* This is one way to handle errors, there are millions in SAS */
			output ErrorMessages;
		end;
	end;

	if EndSwitch = 2
		then do;

		/* This is the line containing the record count */
		RecordedCount = input(strip(_infile_), best15.);

		if InputRecordCount ^= RecordedCount
			then do;
			ErrorMessage = "Incorrect recorded record count " || strip(put(RecordedCount, best15.)) || " compared to measured record count " || strip(put(InputRecordCount, best15.));
			output ErrorMessages;
		end;
	end;
run;
Occasional Contributor
Posts: 5

Re: Loop-load multiple flat files and validate each one

Thanks, I'll start testing these and see how things go!
Occasional Contributor
Posts: 5

Re: Loop-load multiple flat files and validate each one

So I am now getting to a point where the error isn't making any sense. Whether the InputRecordCount is different to, or the same as the RecordedCount, the same error generates. RecordedCount doesn't seem to get populated but I've no idea why?

 

 

File with an incorrect RecordedCount:

 

RULE:     ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+----9----+----0                      

33466                            33,000. 30
    ZONE  222222222222222222222223323330
    NUMR  0000000000000000000000033C000D
InputRecord=. ErrorMessage=Incorrect recorded record count . compared to measured record count 33464 EndSwitch=2
InputRecordCount=33464 EndLine=0 Sirens=1 InputRecordLength=30 RecordedCount=. _ERROR_=1 _INFILE_=33,000 _N_=33466
ERROR: Execution terminated by an ABORT statement at line 21 column 22.

33467     . 1
    ZONE  0
    NUMR  D
InputRecord=  ErrorMessage=  EndSwitch=2 InputRecordCount=33464 EndLine=0 Sirens=1 InputRecordLength=. RecordedCount=. _ERROR_=1
_INFILE_= _N_=33467

 

File with a correct RecordedCount:

RULE:     ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+----9----+----0                      

33506                            33,497. 30
    ZONE  222222222222222222222223323330
    NUMR  0000000000000000000000033C497D
InputRecord=. ErrorMessage=Incorrect recorded record count . compared to measured record count 33497 EndSwitch=2
InputRecordCount=33497 EndLine=0 Sirens=1 InputRecordLength=30 RecordedCount=. _ERROR_=1 _INFILE_=33,497 _N_=33499
ERROR: Execution terminated by an ABORT statement at line 21 column 22.

33507     . 1
    ZONE  0
    NUMR  D
InputRecord=  ErrorMessage=  EndSwitch=2 InputRecordCount=33497 EndLine=0 Sirens=1 InputRecordLength=. RecordedCount=. _ERROR_=1
_INFILE_= _N_=33500

 

 

 

Occasional Contributor
Posts: 5

Re: Loop-load multiple flat files and validate each one

Ok, so I have "solved" the RecordedCount not being loaded, but it throws up a new problem;

 

RecordedCount will only get filled if I match the length of the informat exactly.

 

In the main "read-in" Input statement, something like

input @ 12   Thing 18.

will work fine, and if the number of characters that make up the numeric value at position 12 is, say, 6 in one line and 12 in the next, both are correctly read in as numbers.

 

 

However no matter what method of assigning a format to this variable I use, unless I exactly match the informat length to the number of characters which make up the total, it loads blank.

so assuming the text in the file says

           ==================
                       10,000

 

"input RecordedCount;" (which SAS help alleges will just assign numeric automatically if it sees a number) results in a blank

"input RecordedCount 30.;" - Blank

"input @23 RecordedCount comma6.;" - Blank

"input(_infile_,COMMA30.);"  - Blank

"input(_infile_,COMMA6.)" - Blank

"input(strip(_infile_),COMMA6.)" - Correct!

"input(strip(_infile_),COMMA5.)" - I have a number, but it's truncated.

 

Obviously, RecordedCount might be 10,000 in one file, and 99,999,999,999 in another, and 1 in another.

 

If I'm required to know how many characters are used by the informat, I can't just use Length(strip(_infile_)) to do that... can I?

Super User
Posts: 8,054

Re: Loop-load multiple flat files and validate each one

Look closely at this part of your log:

RULE:     ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+----9----+----0                      

33466                            33,000. 30
    ZONE  222222222222222222222223323330
    NUMR  0000000000000000000000033C000D

The character at position 30 is a '0d'x, or carriage return. Every attempt to read a number that somehow incorporates this character will fail and produce a missing value. Read that with comma29., and you 'll be alright.

Alternatively, you should look at your whole file. If you have a '0d'x at every end of a line, then you should use text mode when copying your file from the DOS/Windows environment to UNIX. Your records will shrink by 1 character, and only contain readable characters.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Occasional Contributor
Posts: 5

Re: Loop-load multiple flat files and validate each one

Posted in reply to KurtBremser
Fantastic, thank you. I had heard elsewhere that SAS would ignore CR / LF if it encountered it mid-column, which I wasn't sure about. COMMA29 does indeed work!

So am I correct in assuming that lengths for Informats are always "count from leftmost position" in these situations?
Super User
Posts: 8,054

Re: Loop-load multiple flat files and validate each one

LF or CRLF will always pose a special problem. When reading from text, LF (on UNIX) or CRLF (on Windows) will automatically terminate the current line and therefore input statement, and never appear in the data.

 

Did you have a

termstr=lf

option in the infile statement? If you read a DOS-formatted file with that, the CR will end up in the data.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Ask a Question
Discussion stats
  • 8 replies
  • 237 views
  • 2 likes
  • 3 in conversation