Re: Reading in an EBCDIC RDF with 2 sets of variables

deleted_user · Posted 07-03-2006 11:34 AM

Hello,

We are reading in a rawdata EBCDIC file from First Data (containing senstive information, so I can unfortunately not provide a sample of the data), that we are reading into SAS and converting to ASCII. We've done about 6 of these types of files without problems. This specific data file is giving us some problems.

The issue is that the file FD provides us contains two sets of variables and lengths...basically the file contains chapter1 and chapter2 data.

Chapter1 is a fixed record length file of 1568 with 159 different variables/columns.
Chapter2 is also fixed record length of 188, with only 28 variables.
And again, the datafile contains records of either chapter, and in a non-sorted way. So record 1 could be either ch1 or ch2. The file is also different from day to day.

I can segregate out the second chapter into a seperate datafile easily, because we know there is a unique identifier field and it is only a length of 188 characters. Here's part of the code for ch2 segregation:

libname cd 'D:\FileConversion\CD061';
filename readin "D:\FileConversion\CD061\CD061.EBCDIC.DAT";

data cd.temp3 (compress=yes);
infile readin recfm=f lrecl = 188 firstobs=1 ;

input
@1 ACCT_TOT_KEY_SYSTEM $ebcdic4.0
@5 ACCT_TOT_KEY_PREFIX $ebcdic7.0
...
;
end;
if L_TPI_B eq 'T';
run;

This creates a dataset with only data from ch2. The file contains 1149 records total, and we kept 475 from ch2 according to the log:

NOTE: The infile READIN is:
File Name=D:\FileConversion\CD061\CD061.EBCDIC.DAT,
RECFM=F,LRECL=188

NOTE: 1149 records were read from the infile READIN.
NOTE: The data set WORK.A has 475 observations and 28 variables.

So the remaining records belong to chapter1.
Up until this point it works great.

Now I'm trying to get the remaining chapter 1 (674 total for this specific file) records into another dataset. I'm getting stuck on how to seperate these. Here's the current code:

data cd.temp4 (compress=yes);
infile readin recfm=f lrecl=1568 missover firstobs=1 ;

input
@12 MULT_FLG $ebcdic1.0;
if MULT_FLG ne 'T' then do;

input
@1 ACCT_JOUR_KEY_SYSTEM $ebcdic4.0
@5 ACCT_JOUR_KEY_PREFIX $ebcdic7.0
..
;
end;
if MULT_FLG ne 'T';
run;

With the lrecl set to 1568, the dataset looks garbled. If I change the lrecl to 188 again, ch1 reads in fine, but of course, only the first 188 characters of the the file, which is approx 28 ch1 variables (so over 100 variables are missing).

I think my question is this, how can I seperate out the ch2 data from the file, and only keep ch1 with all of its 150+ variables?

I've tried various "infile" options, including flowover, missover, truncover.
I've tried different lrecl sizes, but only 188 makes the ch1 data appear correct (but this only lists ~20 variables out of the 150+ total variables for chapter 1).

Hope someone can shed some light on this issue. I've contacted SAS directly, but haven't yet been able to get a solution.

Olivier · Posted 07-04-2006 04:16 AM

What if you try adding a PAD option to your INFILE statement to make sure of the length of the lines you're reading ?
And I would add a @ sign at the end of your first INPUT statement, causing the "reading cursor" to "pause" on this record, and not going to the next one on your next INPUT statement (I don't know if I'm very clear on this point).
Does it change anything ?

Another trick you can use to read both Chapter 1 and 2 at the same time :
1) set your LRECL at the max, with a PAD option
2) add an INPUT ; statement (with nothing else in it)
3) test whether LENGTH(TRIM(_INFILE_)) > 200 : you will know which chapter you're currently reading...
... or at least I hope so.

deleted_user · Posted 07-05-2006 10:07 AM

> What if you try adding a PAD option to your INFILE
> statement to make sure of the length of the lines
> you're reading ?
> And I would add a @ sign at the end of your first
> INPUT statement, causing the "reading cursor" to
> "pause" on this record, and not going to the next one
> on your next INPUT statement (I don't know if I'm
> very clear on this point).
> Does it change anything ?

I added pad and an @ after the first input statement, but nothing changes. The code looks like this:

data ch1 (compress=yes);
infile readin recfm=f lrecl=1568 truncover pad ;

input
@1 ACCT_JOUR_KEY_SYSTEM $ebcdic4.0
@5 ACCT_JOUR_KEY_PREFIX $ebcdic7.0
...
..
@1523 FILLER $ebcdic39.0
@
;
end;
if MULT_FLG ne 'T';
run;

>
> Another trick you can use to read both Chapter 1 and
> 2 at the same time :
> 1) set your LRECL at the max, with a PAD option
> 2) add an INPUT ; statement (with nothing else in
> it)
> 3) test whether LENGTH(TRIM(_INFILE_)) > 200 : you
> will know which chapter you're currently reading...
> ... or at least I hope so.

Not sure I understood this point with the empty input; statement and the LENGTH statement.

deleted_user · Posted 07-04-2006 11:56 AM

the infile option length= names a variable you can test.

eg : /* skeletal demo */

data ch1( keep= )
ch2( keep= );
infile readin recfm=v lrecl=2000 LENGTH= LEN TRUNCOVER ;
****>>>.................................>>> truncover
use truncover to ensure a short last field is read OK ;
INPUT @;
IF LEN > 1000 THEN DO;
*parse/input according to chapter1 ;
output ch1 ;
end;
else do;
*parse/input according to chapter2 ;
output ch2 ;
end;
run;

deleted_user · Posted 07-05-2006 10:21 AM

Another observation I'd like to add. The RDF contains a header in the first row/record. The actual data starts at length 189.

I found that by doing:

data temp (compress=yes);
infile readin recfm=f lrecl = 188 ;

input
@ 1 a0001 $ebcdic1.0
@ 2 a0002 $ebcdic1.0
...
..
.
@ 188 a0188 $ebcdic1.0;
end;
run;

All the data lines up perfectly with one character in each column. With lrcl set at 188, this seems to be the only way to make the data look clean.

Reading in an EBCDIC RDF with 2 sets of variables