BookmarkSubscribeRSS Feed
dropez
Calcite | Level 5
Hi there.

I am reading a DB2 log file into SAS and want to make some error testing in order to validate the data i'm reading.
One error i would like to catch but don't know how is when i try to read more data than what realy comes in each row of the log file.
For example, reading var1-var5 in a row of data with only four variables.
None of the infile options (flowover, missover, truncover, stopover, etc) suits me has i don't want to stop the process if that kind of error is found neither i want to ignore and set the values to missing or jump through the next line (flowover). I would like to output each of this error rows to an error dataset for further analysis and the valid ones to other(s) to continue processing.
I tried checking _error_ after each input but seems that in this case scenario _error_ is always set to 0 unless the stopover option is active, which works kind of like errorabend, stoping the process imediatly.

Any sugestions?

Best regards,
Pedro.
9 REPLIES 9
data_null__
Jade | Level 19
My limited testing indicates with MISSOVER and PAD infile statement options when the line is shorter than expected the value assigned to COLUMN= variable is LRECL+1. For example.

[pre]
1044 filename FT15F001 temp lrecl=64;
1045 data _null_;
1046 infile FT15F001 missover pad column=c;
1047 input a;
1048 put _all_;
1049 parmcards;
1054 ;

NOTE: The infile FT15F001 is:
(system-specific pathname),
(system-specific file attributes)

c=10 a=1 _ERROR_=0 _N_=1
c=3 a=2 _ERROR_=0 _N_=2
c=65 a=. _ERROR_=0 _N_=3
c=5 a=3 _ERROR_=0 _N_=4
NOTE: 4 records were read from the infile (system-specific pathname).
The minimum record length was 0.
The maximum record length was 8.
[/pre]


[pre]
filename FT15F001 temp lrecl=64;
data _null_;
infile FT15F001 missover pad column=c;
input a;
put _all_;
parmcards;
1
2

3
;
run;
[/pre]
dropez
Calcite | Level 5
I'm reading formated input, which is delimited by a '|', and each column is truncated to the length of the string in it. I don't have a clue about lrecl before actually reading the data.

my data is more or less like this:

row 1: aaa | 1 | bb | cccc | 22
row 2: aa | 23 | b | ccc | 345
row 3: a | 4 | bbb | cccc

I want to catch the missing column in row 3 and send it to errors.
Unless i'm missing something i think your solution doesn't apply to my case.
data_null__
Jade | Level 19
My solution would not work because with a delimiter other than BLANK you would need a trailing delimiter.

Why can't you just use MISSOVER if the i'th variable is missing your record is short.

Or you could use the _INFILE_ variable and count delimiters if the number don't suit ya, then take some action.
dropez
Calcite | Level 5
I realy need to catch this kind of errors in order to ensure the DW consistency.
I receive a pair of files each day, one being the DB2 log file and the other the DB2 structure at the time the log file was produced.
Reading something different than what the structure file indicates means one of two things:
1- The structure file is damaged.
2- The log file is damaged.

Neither one of this two options can be ignored as it will result in corrupting de DW.
I could just use STOPOVER option but that would delay the processing of hundreds of tables just because one single row and that is what i want to avoid.

I'll look if the _INFILE_ option suits my needs.
sbb
Lapis Lazuli | Level 10 sbb
Lapis Lazuli | Level 10
To the OP: you really need to share the exact SAS code being used to read the DB2 log data (an external data file) - also info about where the external data file "originates" and equally important where the external file "resides" when you are accessing it from your SAS program.

With variable-length data records, you have an INFORMAT $VARYINGnnn. which may be suitable, but we really need to see exactly what you're working with -- hence the need for you to post in a reply, the SAS-generated LOG (DATA STEP) where you are attempting your external data input handling.

Scott Barry
SBBWorks, Inc.
dropez
Calcite | Level 5
Very simplified sample:
...
filename list pipe "ls -1 /stagingDelta/ftrans/LOGDB2*";
...
infile indata truncover;
input iteminlist $100.;
item=iteminlist;
infile dummy filevar=item end=done dlm='|' dsd lrecl=10000;
do while(^done);
input col1-col&colnumber $;
...

&colnumber is built based on the information on the structure file mentioned before!

I just want to know if it is possible to catch the "error" LOST CARD while processing, not loosing the CARD, but putting it to some kind of quarenteen and continue processing. I do not want no MISSOVER or TRUNCOVER, neither i want to stop all the process just because of one row (STOPOVER).
If there are less columns than it's supposed in that row, if instead of having &colnumber columns i have only &colnumber-1, i want to be able to catch it.
I am not able to know the lrecl of the row as each row as a different length, even if it is about the same table, which isn't a problem since my log file comes delimited by "|"
As i explained before, i have a log file (.txt) with all the inserts, deletes and updates that took place in DB2, in the same machine where my sas program is running. Along with that log file comes a structure file containing the definition of each table that comes in the log file (number of variables, each variable length, type, integrity constraints, keys, etc).
If that structure file tells me that i have to read 10 variables in each row of the log file for that table then i really need to do so, or else one of the two files is damaged and some manual work has to be done.
The problem is i can't seem to catch an error when SAS reads less information than it is supposed to. If i tell the script to read 10 variables but there are only 9 in that row the only way i can identify that is by setting the STOPOVER option in the filename statement, which imediatly stops the process. I want to avoid stopping the process, else, put that/those row/s to some kind of quarenteen file and process the remainig log.
data_null__
Jade | Level 19
> If i tell the script to read 10 variables but there are
> only 9 in that row the only way i can identify that

Count the delimiters.
Peter_C
Rhodochrosite | Level 12
> infile dummy filevar=item end=done dlm='|' dsd
> lrecl=10000;
> do while(^done);
> input col1-col&colnumber $;
> ...

>
> I just want to know if it is possible to catch the
> "error" LOST CARD while processing, not loosing the
> CARD, but putting it to some kind of quarenteen and


there is an infile option LENGTH= which names a variable which will be filled with the length of the "current" line
there is also option COLUMN= which names a variable to hold the current column position on the input line (at which the next INPUT statement would read with no other positioning directive)
You can read accross the line until the "next" position will be at the end of the line (FLOWOVER will need to be avoided)


> infile dummy filevar=item end=done dlm='|' dsd
col=col length=len truncover
> lrecl=10000;

> do while(^done);
array col(&colNumber) ; *** add $ if string not numeric*** ;

input @1 @ ; ** position at start of input line ;

do col_number = 1 to &colnumber while( col < len ) ;
input col(col_number) @ ;
end ;
if col_number LT &col_number then do;
* report the problem ;
end ;



alternatively examine the number of delimiters in the _infile_ buffer
(not good enough if the delimiter might be embedded in data, like in "erty|tyu" in:
2345|3456|"erty|tyu"|4567||
input @ ;
n_delim = count ( _file_, '|' );
provides a count of the delimiters on the input line
If that is always OK,
then check the last 1 or 2 characters on the file buffer
if substr( _file_, length( _file_ ),1) EQ '|' then do;
***** report last column is empty ;
end ;
Alternative check if the last column is empty
if scan( _file_, -1,'|' ) EQ ' ' then do;
***** report last column is empty ;
end ;
if scan( _file_, -2,'|' ) EQ ' ' then do;
***** report second last column is empty ;
end ;
data_null__
Jade | Level 19
> alternatively examine the number of delimiters in
> the _infile_ buffer
> not good enough if the delimiter might be embedded in
> data, like in "erty|tyu" in:
> 2345|3456|"erty|tyu"|4567||

How about counting the words.

[pre]
1951 filename FT15F001 temp;
1952 canWeCountWords:
1952! data _null_;
1953 infile FT15F001;
1954 input @;
1955 c = countW(_infile_,'|','MQ');
1956 put 'NOTE: ' c=;
1957 list;
1958 parmcards;
1961 ;;;;

NOTE: The infile FT15F001 is:
(system-specific pathname),
(system-specific file attributes)

NOTE: c=6
RULE: ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+----9----+----0
1 2345|3456|"erty|tyu"|4567|| 27
NOTE: c=2
2 | 1
NOTE: 2 records were read from the infile (system-specific pathname).
The minimum record length was 1.
The maximum record length was 27.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds
[/pre]

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 1416 views
  • 0 likes
  • 4 in conversation