Solved: Reading EBCDIC Files with null characters ('00'x) in fields

PAzevedo · Posted 04-04-2012 07:43 AM

Hi there,

I'm trying to read some files in EBCDIC from a Mainframe, to our HP-UX SAS Servers.

I'm accessing the files directly in the Mainframe using filename ftp.

The resulting datasets look fine but i'm facing a problem on some fields where there are null ('00'x) characters in the midle of some fields, as the info in those fields gets truncated as the first null charecter is found.

Next is a hexadecimal representation of one of those fields, with a length of 30, taken from the _infile_ variable. I've underlined the severel ocurrences of the null characters.

F2F0F2F5F6F0F0F0F0F0F1F4F1C200000000000CF2F0F1F2F0F3F2F00000

When i translate these fields from EBCDIC to ASCII, using the $ebcdic30. informat in the input statement, all the info after the first null gets truncated leaving me with an incomplete record:

F2F0F2F5F6F0F0F0F0F0F1F4F1C2

From what i could already analyse, the information read from the input buffer is correct, i've analysed the source record in TSO and the hex representation matches the one i'm reading in SAS.

The problem seems to be when converting with the ebcdic informat.

Any idea on how to overcome this issue?

Thks in advance.

Regards.

I'll attach the code and some partial log for any debuging needs (The table "estruturas" has the structure of the information to be read in the LOG file):

FriedEgg · Posted 04-04-2012 12:40 PM

PAzevedo,

See my posting, you can read the columns with a normal input statement after adjusting the buffer as I did in the example, here I parse your data into a series of numeric and character variables for each byte.

data foo;
 infile tmp;
 input @;
 _infile_=prxchange('s/\x00/@/o',-1,_infile_); *or translate, like Tom, I just prefer regex;
 input (v1-v12) (s370ff1.) (v13-v19) ($ebcdic1.) (v20-v30) (s370ff1.);
 put 32*'-' / (v1-v30) (10*3. /) / 32*'-';
run;
--------------------------------
  0  2  5  6  0  0  0  0  0  1
  4  1B                      2
  0  1  2  0  3  2  0  .  .  .
--------------------------------

View solution in original post

art297 · Posted 04-04-2012 08:25 AM

Try the filename option: encoding=unicode

PAzevedo · Posted 04-04-2012 08:48 AM

It gives a Warning message

WARNING: A character that could not be transcoded was encountered.

and no obs are written to the datasets as the value that controls wich dataset to output (log_tabela) is not correctly translated

log_tabela=’’’’’’’’’’’’’’’’’’

art297 · Posted 04-04-2012 09:23 AM

I would suggest, first, that you delete this discussion, change your password, and start a new discussion. The log you posted has your ip address, user name and password included.

When you repost, yes the code and log would help, but seeing at least a snippet of the data would also be quite helpful.

PAzevedo · Posted 04-04-2012 09:38 AM

Thks for the warning.

The IP address is from a closed internal network so there's not a problem there. The user and pass might be a problem if you can access the internal network.

I've changed the pwd anyways and edited the log file.

I'm not sure on what data i can show you as i can only access the file trough a TSO terminal. The file is variable length and, as you can see in the code, it's read based on a structure file wich indicates what to read in each record.

art297 · Posted 04-04-2012 09:51 AM

It has been too many years since I've worked with Mainframe systems, thus I can only guess. However, looking at the snippet of data that I could see in your log, it definitely doesn't appear to be unicode.

The following, from the documentation, might help:

$CHARZBw.: reads character data and converts any byte that contains a binary zero to a blank.
$EBCDICw.: converts character data to EBCDIC. Under z/OS, $EBCDIC and $CHAR are equivalent.

PAzevedo · Posted 04-04-2012 10:20 AM

I'm getting somewhere with the $CHARZBw. informat.

When inputing the data with that informat the nulls are replaced with '20'x (hex code for space in ASCII). If afterwards i apply the $EBCDICw. informat the data is well translated to ASCII but instead of blanks in place of nulls i get '€' wich is the EBCDIC symbol for the '20'x hex code.

Now i just have to find how the $CHARZBw. informat translates to EBCDIC space hex code and not ASCII space hex code.

At the end, '€' character might be as good garbage as the null characters that were in the field.

Thks for your help.

Any more comments are much appreciated.

Regards.

PAzevedo · Posted 04-04-2012 11:56 AM

art or anyone else,

Is there any trick to make SAS consider converting the binary 0s to EBCDIC space hex code ('40'x) instead of ASCII hex code ('20'x) when using informat $CHARZBw?

It turns out that having a € for the binary 0 doesn't suit my problem, i really need it to be converted to spaces.

Regards.

Tom · Posted 04-04-2012 11:53 AM

Read it using $CHAR.

Translate the '00'x to '40'x (that is space in EBCDIC).

Convert to ASCII using $EBCDIC informat.

57   data _null_;
58    hex='F0F2F5F6F0F0F0F0F0F1F4F1C200000000000CF2F0F1F2F0F3F2F0000000';
59    raw=input(hex,$hex60.);
60    fix=translate(raw,'40'x,'00'x);
61    ascii=input(fix,$ebcdic30.);
62    put hex=/ (raw fix ascii) (= $hex. /) / ascii=;
63   run;
hex=F0F2F5F6F0F0F0F0F0F1F4F1C200000000000CF2F0F1F2F0F3F2F0000000
raw=F0F2F5F6F0F0F0F0F0F1F4F1C200000000000CF2F0F1F2F0F3F2F0000000
fix=F0F2F5F6F0F0F0F0F0F1F4F1C240404040400CF2F0F1F2F0F3F2F0404040
ascii=3032353630303030303134314220202020200C3230313230333230202020
ascii=025600000141B      
20120320

Note that you string includes '0C'x which is normally a page feed.

FriedEgg · Posted 04-04-2012 12:15 PM

I agree with Tom and will reiterate his note, '0C'x is a form feed (page break) so be aware of this in your data and probably be prepared to remove it.

/* resemble your file */
filename tmp temp;
data _null_;
 file tmp;
 hex='F0F2F5F6F0F0F0F0F0F1F4F1C200000000000CF2F0F1F2F0F3F2F0000000'x;
 put hex;
run;
/* read your file */
data foo;
 infile tmp;
 input @;
 _infile_=prxchange('s/\x00/@/',-1,_infile_);
 input want $ebcdic30.;
run;
want=025600000141B      
20120320

PAzevedo · Posted 04-04-2012 12:41 PM

Hi FriedEgg.

That specific character isn't a problem for me as SAS has no problem in storing it, although others like '0A'x, '0D'x, '3F'x, '25'x are.

There is a bunch of garbage in our Mainframe tables that have given us a lot of headaches trough time!

In order to give the almost full picture, we are changing all our SAS applications to start getting the info we need directly from the Mainframe, trough FTP, instead of beeing the mainframe to do the parsing of the files and then sending them to our SAS servers (a new offload policy). This means that every parsing being made nowadays in the applications that send us the files needs to be done exactly the same way (e.g. delimiters, special characters, etc.) by our SAS Apps or else there is a huge risk on the new data not matching the one we have nowadays.

The '0C'x isn't currently being parsed by the Mainframe apps so it will stay the same in the new SAS App.

But, now that you and Tom mentioned it, i'm starting to feel that there isn't really a simple and light solution for my problem as i expected.

I've runned away from pearl expressions for a long time but i believe now it's the right time to dive into it.

Thks for your help.

Regards.

PAzevedo · Posted 04-04-2012 12:24 PM

Hi Tom,

Reading the file with the $CHARw informat has the same behaviour than with $EBCDICw truncating the field after it encounters the null.

Picking in your sugestion, i'm creating one variable containing the whole _infile_ buffer converted to hexdecimal to apply your translation code afterwards. After this parsing i'll have to substring to get every column. This seems a very heavy solution but it is, until now, the best one i have.

Tom · Posted 04-04-2012 12:28 PM

Just use the _INFILE_ trick as posted above by FriedEgg. The regex is overkill though for such a simple problem.

data want ;

infile .... ;

input @;

_infile_=translate(_infile_,'40'x,'00'x);

input field1 $ebcdic20. field2 $ecbcdic12. .... ;

run;

PAzevedo · Posted 04-04-2012 12:44 PM

I hadn't read FriedEgg's post when i answeared you.

FriedEgg · Posted 04-04-2012 12:40 PM

PAzevedo,

See my posting, you can read the columns with a normal input statement after adjusting the buffer as I did in the example, here I parse your data into a series of numeric and character variables for each byte.

data foo;
 infile tmp;
 input @;
 _infile_=prxchange('s/\x00/@/o',-1,_infile_); *or translate, like Tom, I just prefer regex;
 input (v1-v12) (s370ff1.) (v13-v19) ($ebcdic1.) (v20-v30) (s370ff1.);
 put 32*'-' / (v1-v30) (10*3. /) / 32*'-';
run;
--------------------------------
  0  2  5  6  0  0  0  0  0  1
  4  1B                      2
  0  1  2  0  3  2  0  .  .  .
--------------------------------

Reading EBCDIC Files with null characters ('00'x) in fields

Re: Reading EBCDIC Files with null characters ('00'x) in fields

Reading EBCDIC Files with null characters ('00'x) in fields

Reading EBCDIC Files with null characters ('00'x) in fields

Reading EBCDIC Files with null characters ('00'x) in fields

Reading EBCDIC Files with null characters ('00'x) in fields

Reading EBCDIC Files with null characters ('00'x) in fields

Reading EBCDIC Files with null characters ('00'x) in fields

Re: Reading EBCDIC Files with null characters ('00'x) in fields

Re: Reading EBCDIC Files with null characters ('00'x) in fields

Re: Reading EBCDIC Files with null characters ('00'x) in fields

Reading EBCDIC Files with null characters ('00'x) in fields

Reading EBCDIC Files with null characters ('00'x) in fields

Reading EBCDIC Files with null characters ('00'x) in fields

Reading EBCDIC Files with null characters ('00'x) in fields

Re: Reading EBCDIC Files with null characters ('00'x) in fields

Catch up on SAS Innovate 2026

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away