BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
PAzevedo
Fluorite | Level 6

Hi there,

I'm trying to read some files in EBCDIC from a Mainframe, to our HP-UX SAS Servers.

I'm accessing the files directly in the Mainframe using filename ftp.

The resulting datasets look fine but i'm facing a problem on some fields where there are null ('00'x) characters in the midle of some fields, as the info in those fields gets truncated as the first null charecter is found.

Next is a hexadecimal representation of one of those fields, with a length of 30, taken from the _infile_ variable. I've underlined the severel ocurrences of the null characters.

F2F0F2F5F6F0F0F0F0F0F1F4F1C200000000000CF2F0F1F2F0F3F2F00000

When i translate these fields from EBCDIC to ASCII, using the $ebcdic30. informat in the input statement, all the info after the first null gets truncated leaving me with an incomplete record:

F2F0F2F5F6F0F0F0F0F0F1F4F1C2

From what i could already analyse, the information read from the input buffer is correct, i've analysed the source record in TSO and the hex representation matches the one i'm reading in SAS.

The problem seems to be when converting with the ebcdic informat.

Any idea on how to overcome this issue?

Thks in advance.

Regards.

I'll attach the code and some partial log for any debuging needs (The table "estruturas" has the structure of the information to be read in the LOG file):

1 ACCEPTED SOLUTION

Accepted Solutions
FriedEgg
SAS Employee

PAzevedo,

See my posting, you can read the columns with a normal input statement after adjusting the buffer as I did in the example, here I parse your data into a series of numeric and character variables for each byte.

data foo;

infile tmp;

input @;

_infile_=prxchange('s/\x00/@/o',-1,_infile_); *or translate, like Tom, I just prefer regex;

input (v1-v12) (s370ff1.) (v13-v19) ($ebcdic1.) (v20-v30) (s370ff1.);

put 32*'-' / (v1-v30) (10*3. /) / 32*'-';

run;

--------------------------------

  0  2  5  6  0  0  0  0  0  1

  4  1B                      2

  0  1  2  0  3  2  0  .  .  .

--------------------------------

View solution in original post

20 REPLIES 20
art297
Opal | Level 21

Try the filename option: encoding=unicode

PAzevedo
Fluorite | Level 6

It gives a Warning message

WARNING: A character that could not be transcoded was encountered.

and no obs are written to the datasets as the value that controls wich dataset to output (log_tabela) is not correctly translated

log_tabela=’’’’’’’’’’’’’’’’’’

art297
Opal | Level 21

I would suggest, first, that you delete this discussion, change your password, and start a new discussion.  The log you posted has your ip address, user name and password included.

When you repost, yes the code and log would help, but seeing at least a snippet of the data would also be quite helpful.

PAzevedo
Fluorite | Level 6

Thks for the warning.

The IP address is from a closed internal network so there's not a problem there. The user and pass might be a problem if you can access the internal network.

I've changed the pwd anyways and edited the log file.

I'm not sure on what data i can show you as i can only access the file trough a TSO terminal. The file is variable length and, as you can see in the code, it's read based on a structure file wich indicates what to read in each record.

art297
Opal | Level 21

It has been too many years since I've worked with Mainframe systems, thus I can only guess.  However, looking at the snippet of data that I could see in your log, it definitely doesn't appear to be unicode.

The following, from the documentation, might help:

$CHARZBw.

reads character data and converts any byte that contains a binary zero to a blank.

$EBCDICw.

converts character data to EBCDIC. Under z/OS, $EBCDIC and $CHAR are equivalent.

PAzevedo
Fluorite | Level 6

I'm getting somewhere with the $CHARZBw. informat.

When inputing the data with that informat the nulls are replaced with '20'x (hex code for space in ASCII). If afterwards i apply the $EBCDICw. informat the data is well translated to ASCII but instead of blanks in place of nulls i get '€' wich is the EBCDIC symbol for the '20'x hex code.

Now i just have to find how the $CHARZBw. informat translates to EBCDIC space hex code and not ASCII space hex code.

At the end, '€' character might be as good garbage as the null characters that were in the field. Smiley Happy

Thks for your help.

Any more comments are much appreciated.

Regards.

PAzevedo
Fluorite | Level 6

art or anyone else,

Is there any trick to make SAS consider converting the binary 0s to EBCDIC space hex code ('40'x) instead of ASCII hex code ('20'x) when using informat $CHARZBw?

It turns out that having a € for the binary 0 doesn't suit my problem, i really need it to be converted to spaces.

Regards.

Tom
Super User Tom
Super User

Read it using $CHAR.

Translate the '00'x to '40'x (that is space in EBCDIC).

Convert to ASCII using $EBCDIC informat.

57   data _null_;

58    hex='F0F2F5F6F0F0F0F0F0F1F4F1C200000000000CF2F0F1F2F0F3F2F0000000';

59    raw=input(hex,$hex60.);

60    fix=translate(raw,'40'x,'00'x);

61    ascii=input(fix,$ebcdic30.);

62    put hex=/ (raw fix ascii) (= $hex. /) / ascii=;

63   run;

hex=F0F2F5F6F0F0F0F0F0F1F4F1C200000000000CF2F0F1F2F0F3F2F0000000

raw=F0F2F5F6F0F0F0F0F0F1F4F1C200000000000CF2F0F1F2F0F3F2F0000000

fix=F0F2F5F6F0F0F0F0F0F1F4F1C240404040400CF2F0F1F2F0F3F2F0404040

ascii=3032353630303030303134314220202020200C3230313230333230202020

ascii=025600000141B     

20120320

Note that you string includes '0C'x which is normally a page feed.

FriedEgg
SAS Employee

I agree with Tom and will reiterate his note, '0C'x is a form feed (page break) so be aware of this in your data and probably be prepared to remove it.

/* resemble your file */

filename tmp temp;

data _null_;

file tmp;

hex='F0F2F5F6F0F0F0F0F0F1F4F1C200000000000CF2F0F1F2F0F3F2F0000000'x;

put hex;

run;

/* read your file */

data foo;

infile tmp;

input @;

_infile_=prxchange('s/\x00/@/',-1,_infile_);

input want $ebcdic30.;

run;

want=025600000141B     

20120320

PAzevedo
Fluorite | Level 6

Hi FriedEgg.

That specific character isn't a problem for me as SAS has no problem in storing it, although others like '0A'x, '0D'x, '3F'x, '25'x are.

There is a bunch of garbage in our Mainframe tables that have given us a lot of headaches trough time!

In order to give the almost full picture, we are changing all our SAS applications to start getting the info we need directly from the Mainframe, trough FTP, instead of beeing the mainframe to do the parsing of the files and then sending them to our SAS servers (a new offload policy). This means that every parsing being made nowadays in the applications that send us the files needs to be done exactly the same way (e.g. delimiters, special characters, etc.) by our SAS Apps or else there is a huge risk on the new data not matching the one we have nowadays.

The '0C'x isn't currently being parsed by the Mainframe apps so it will stay the same in the new SAS App.

But, now that you and Tom mentioned it, i'm starting to feel that there isn't really a simple and light solution for my problem as i expected.

I've runned away from pearl expressions for a long time but i believe now it's the right time to dive into it.

Thks for your help.

Regards.

PAzevedo
Fluorite | Level 6

Hi Tom,

Reading the file with the $CHARw informat has the same behaviour than with $EBCDICw truncating the field after it encounters the null.

Picking in your sugestion, i'm creating one variable containing the whole _infile_ buffer converted to hexdecimal to apply your translation code afterwards. After this parsing i'll have to substring to get every column. This seems a very heavy solution but it is, until now, the best one i have.

Tom
Super User Tom
Super User

Just use the _INFILE_ trick as posted above by FriedEgg.  The regex is overkill though for such a simple problem.

data want ;

   infile .... ;

   input @;

   _infile_=translate(_infile_,'40'x,'00'x);

   input field1 $ebcdic20. field2 $ecbcdic12. .... ;

run;

PAzevedo
Fluorite | Level 6

I hadn't read FriedEgg's post when i answeared you. Smiley Happy

FriedEgg
SAS Employee

PAzevedo,

See my posting, you can read the columns with a normal input statement after adjusting the buffer as I did in the example, here I parse your data into a series of numeric and character variables for each byte.

data foo;

infile tmp;

input @;

_infile_=prxchange('s/\x00/@/o',-1,_infile_); *or translate, like Tom, I just prefer regex;

input (v1-v12) (s370ff1.) (v13-v19) ($ebcdic1.) (v20-v30) (s370ff1.);

put 32*'-' / (v1-v30) (10*3. /) / 32*'-';

run;

--------------------------------

  0  2  5  6  0  0  0  0  0  1

  4  1B                      2

  0  1  2  0  3  2  0  .  .  .

--------------------------------

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 20 replies
  • 12708 views
  • 6 likes
  • 4 in conversation