DATA Step, Macro, Functions and more

$ebcdic missing 8 bytes with locale off

Accepted Solution Solved
Reply
Frequent Contributor
Posts: 124
Accepted Solution

$ebcdic missing 8 bytes with locale off

Greetings all.  The short of it is that even with the 'locale' option commented out in my config file, the $ebcdic format still fails to translate 8 characters correctly according to IBM-037 code page.  This is a great improvement over the 33 missing translations when local is set to en_us, but I still would like to get them all working.  I'm using 9.3 on Windows 7 32 bit.  I've tried using trantab to edit both the ansiebcd and ebcdic tables, and saved them to my sashelp.host library, but they don't appear to have anything to do with the $ebcdic format.  Based on what I found when googling, it seems the $ebcdic format uses translation tables loaded (and uneditable?) at start up.  I'm just wondering if there is a way to fix this issue.  My work around is to load up an array with the round-trip values of extended ascii to ebcdic(037) and back, then use that array to compare byte by byte.  It works, but seems a bit of overkill.  I'm using odbc to pull data from DB2, one field of which is binary data that contains fields of text and also packed decimal.  I need to convert the binary to hex, then parse out the packed decimal.  I know I could just use the DB2 HEX() function to convert before I import, but that would spoil the fun.  Anyhow, the attached program should basically show the problem.  If you run it, the result will be a table of the 8 error bytes I need to fix.  Any help would be much appreciated.  Thank you.

Greg

*Edit* - I found this, 14742 - Generated translation tables for $EBCDICw. and $ASCIIw. informatscontain invalid values, which seems to speak to the issue of locale, but also seems to imply the missing values are attributable to '00'x being in many of the cells.  However, the 8 I'm missing must be values other than '00'x that are just wrong.

Added link - Greg

Attachment

Accepted Solutions
Solution
‎04-23-2014 06:44 PM
Super User
Super User
Posts: 7,042

Re: $ebcdic missing 8 bytes with locale off

I do not think there is any need for you to worry about translation if you can get DB2 to convert to HEX and then convert back from HEX to binary.

So if you perform this query.

proc sql ;

connect to odbc (.....);

create table sasdata as

select input(hexfield,$hex512.) as field

from connection to odbc

(select hex(field) as hexfield from .... )

;

quit;

Then you should have the data back as it was originally .

Then you could process the first 7 bytes using:

data want ;

  set sasdata;

  category = input(substr(field,1,1),$ebcdic1.);

  value = input(substr(field,2,6),s370fpd6.2);

run;

View solution in original post


All Replies
Trusted Advisor
Posts: 3,212

Re: $ebcdic missing 8 bytes with locale off

Please explain your issue more in detail.

There are chars in Ebcdic that cannot be converted to Ascii. Tey are mostly faked ( not sign, cent sign) the special control chars (eof,lf,cr) also having issues.

You could use utf-8 as used at Windows/office Html, but the mainframe is not supporting that.  That issue-s are part of NLS, see:SAS(R) 9.4 National Language Support (NLS): Reference Guide, Second Edition as both sides codepages are important. By the way the latin1 charset of Windows (java) is not the same as that of Unix based ones.

You could get traumatized.

---->-- ja karman --<-----
Frequent Contributor
Posts: 124

Re: $ebcdic missing 8 bytes with locale off

Jaap, this is about as simple as I can get it...

data _null_ ;

    ascii_hex = put(byte(10),$hex2.) ;

    ebcdic_hex = put(put(byte(10),$ebcdic.),$hex2.) ;

    put 'ascii_hex for decimal value 10 = ' ascii_hex ;

    put 'ebcdic_hex using $ebcdic. = ' ebcdic_hex ', but it should be 25' ;

run ; quit ;

There are 8 codepoints not properly translating when using the $ebcdic format with locale= turned off.  This is just a visual of decimal value 10 (ascii code point '0A').  It should map to ebcdic codepoint '25', but the $ebcdic format maps it to '15'.  The other decimal values not properly translating, as per IBM_037 are 91, 93, 94, 133, 168, 172, and 221)

Specifically, does anyone know of a way to change the mapping of the 8 decimal values listed above when using the $ebcdic format? I've checked my array workaround on millions of observations, and the translations I took from the 'Code page translation' section at the bottom of EBCDIC 037 - Wikipedia, the free encyclopedia are accurate for my data 100% of the time.  Thank you.

Greg

Super User
Posts: 19,789

Re: $ebcdic missing 8 bytes with locale off

Sounds like an issue to bring up with Tech Support.

Super User
Super User
Posts: 7,042

Re: $ebcdic missing 8 bytes with locale off

The mapping from EBCDIC -> ASCII and back again is not well defined.  Most mappings I have seen are not reversible.

For example how do we map the EBCDIC code 15 (which is intended as NL or NEW LINE) to ASCII?  There is not a direct equivalent.  Perhaps we should map it to the code for LF.  That makes good sense on a Unix system that uses LF as the end of line character in text files.

What are you actually trying to achieve?  Perhaps you would get better results using TRANSLATE function instead of the $EBCDIC format?

Frequent Contributor
Posts: 124

Re: $ebcdic missing 8 bytes with locale off

Tom, what I am doing is this:  pull out the packed decimal portions of a binary field, the structure of which, when converted to hex, is an array of elements each 14 characters long.  So, take this ebcdic hex string as an example...

C200000006377CD400000054318C

When I import it through ODBC as binary, it looks like gobbledygook, but when converted to ebcdic hex it looks as above.  Knowing the structure of the data, I can take the packed decimal (6.2) portions and sum them to get the customers' account balance.  I have to lop off the first byte of each element first, the 'C2' and 'D4'.  So, in this case, it would be 63.77 (from the first element,C200000006377C) plus 543.18 (from the second element, D400000054318C) to get an account balance of 606.95.  The problem is that I have to first convert the binary to hex, which, since my machine is Windows, ends up being ascii hex.  So, the above ebcdic hex in ascii ends up being...

420000008604404D000000E891F0

I then have to take this ascii hex string and convert it to ebcdic hex.  That is where the missing values trip me up.  The easiest solution is to just use the DB2 HEX() function which would preclude me from having to do the conversion on my machine.  And this would make the most sense, since I would be letting the mainframe do what it is supposed to do.  My array solution also works like a charm.  I guess my curiosity in wanting to do it all on my machine is purely academic, just to see if it can be done.  Anyhow, I have a couple of solutions so I'll just stick with what works, and accept that there are limitations with round trip conversions between ebcdic and extended ascii.  Thank you all for taking a look.

Greg

Super User
Super User
Posts: 7,042

Re: $ebcdic missing 8 bytes with locale off

I still do not understand.  If you moved it as binary then just use an INFORMAT to read the binary string.  Why are you worrying about converting binary fields?

data xx ;

  input str $hex12. ;

  x = input(str,s370fpd6.2);

  put str $hex12. +1 x ;

cards ;

00000006377D

00000054318C

00000060695C

run;

00000006377D -63.77

00000054318C 543.18

00000060695C 606.95

Trusted Advisor
Posts: 3,212

Re: $ebcdic missing 8 bytes with locale off

Tom, I believe It was first converted Ebcdic-Ascii by the ODBC interface and of that conversion (packed fields) needed to be translated back. 

Binary is binary and at that level there not difference HEX-ascii or HEX-ebcdic. That story is giving me the impression of that additional conversion.

---->-- ja karman --<-----
Frequent Contributor
Posts: 124

Re: $ebcdic missing 8 bytes with locale off

Tom, the field is an array of structured data.  So, the string 'C200000006377CD400000054318C' is two elements, each 7 bytes, with the first byte of each element representing a text account category.

C2 00000006377C D4 00000054318C

The 'C2' represents a category of 'B' (ebcdic 'C2' = ascii '42' = 'B'), with the rest of that element being a packed decimal = 63.77.  The next element, starting at the 8th byte, is for category 'M' (ebcdic 'D4' = ascii '4D' = 'M'), with a packed decimal value of 543.18.  So, I need to lop off the first byte of each element in the field before I can convert the rest to a decimal value so I can sum them.  The DB2 field is 256 bytes, so it is possible for any one observation to have a string of 30 or more 7 byte elements that all need to be have the first byte lopped off then bytes 2 through 7 summed.  Is there a way to get rid of the first byte of each element without converting it to a hex string?  If so, that might be worth a try.  Thanks.

Super User
Super User
Posts: 7,042

Re: $ebcdic missing 8 bytes with locale off

So your example has fields of length 7 bytes. With the first byte being a character and the next 6 being packed decimal.

You can loop over you larger field and parse it into individual records.

Here is one example way using your sample value.

data xx ;

  length field $256 ;

  field = 'C200000006377CD400000054318C'x ;

  field = translate(field,'40'x,' ');

  sum=0;

  length next $7 category $1 value 8 sum 8 ;

  do i=0 by 1 until (category=' ');

    next = substr(field,1+i*7);

    category = input(next,$ebcdic1.);

    if category ne ' ' then do;

      value = input(substr(next,2),s370fpd6.2);

      sum=sum(sum,value);

      output;

    end;

    put i= next=$hex14. +1 category= +1 category $hex2. +1 value= 7.2 ;

  end;

  put sum=;

run;

i=0 next=C200000006377C  category=B  42 value=63.77

i=1 next=D400000054318C  category=M  4D value=543.18

i=2 next=40404040404040  category=   20 value=543.18

sum=606.95

Frequent Contributor
Posts: 124

Re: $ebcdic missing 8 bytes with locale off

Thank you Tom.  That is kind of what I am doing with my array solution posted at the start of the thread (although not nearly as elegantly as you have done).  That brings me back to my issue with ascii to ebcdic translations.  Take, for example, this one, which I have translated using my own array of ibm-037.

C200000008158CD500000003216C (this is what I need the resulting hex string to be)

Since I am getting the data as binary, I have to first convert it to hex (ascii), which gives me...

420000009785F04E000000038125, which then has to be converted to ebcdic hex, and therein lies my problem

Notice this(added spaces intentional) conversion when I use $ebcdic to convert the ascii hex to ebcdic..

4200000097 85 F04E000000038125 -->> C200000008 25 8CD500000003216C

Everything is fine except for the conversion of the '85'x to '25'x.  According to ibm-037, it should be converting to '15'x, so any fields that have a '85'x in them as ascii hex give the wrong answer.

I did just speak to one of the mainframe guys who told me this field was designed to be fed into a COBOL function on the client 3270, converted to hex, then sliced and diced by the COBOL.  It was not intended for a direct manipulation by any other program.  Given that tidbit, however, I still think $ebcdic could work if I could just fix the mappings of the 8 errored characters.

Greg

Super User
Super User
Posts: 7,042

Re: $ebcdic missing 8 bytes with locale off

I am still confused about where the problem is being introduced.  Let me restate and you can tell me what I have done wrong.

First let's use SAS notation and use hex literals ("hhhh"x)  to reference values that have been converted to hex so humans can read them and normal quoted strings ("fred")  for values that are the actual characters. (for example the three character string "123" would be stored in EBCDIC as the three byte string "F1F2F3"X and in ASCII as the string "313233"X )

Say your database variable is named FIELD.

My understanding is that the data in the database is stored as BINARY. So for example the first 14 bytes of the 256 byte field is 'C200000008158CD500000003216C'X.

If this field is moved by your database connection as BINARY data then it will look exactly the same in SAS.  So if in SAS you used the $HEX format to look at the value (put field $hex28.Smiley Wink, then you would see in the log the 28 character string 'C200000008158CD500000003216C' .


If that is the case then the code above should work without any issues.  There is no need to do anything within SAS to try and convert this binary field to or from ASCII.  (Other than if you want to interpret one or more of the bytes as EBCDIC characters as in the code above.)


So let's assume that it is NOT the case. This means that your database connection does NOT allow you to move a field as binary and has instead imposed some type of transcoding on the data.

Now if SAS had done the conversion using the $ECBDIC informat (instead of your database connection software) then you would see in your log from the previous PUT statement a string like:

'420000000000F04E000000038125'.  Your string might be different as the conversion has been either done somewhere before SAS gets a hold of it or using some other component of SAS. In fact it better NOT match the string generated by the $EBCDIC. format because that format maps 28 different bytes to '00'x, as so it is NOT reversible. But as long as the translation that your database connect does use maps each possible byte is to a unique byte then you can reverse the translation by creating a translation table or string or whatever you want.


So if you are in this situation then you want to apply something like the TRANSLATE function to FIELD to convert it back to the BINARY data that it used to have ( that is back  to 'C200000008158CD500000003216C'X ) and then you can use the previous post to parse it into blocks and convert the text part to ASCII and the number part to numbers.  A good way to test that the translation is reversible is to fill a record in the database will all 256 possible byte values and pull it into SAS. This will also have the added benefit of generating the translation string that you will need to use to reverse the conversion.


Now if you cannot get that to work because the transcoding process is NOT reversible then you will need to find a way to convert the binary field to hex strings and copy that from the database into SAS.  For example in SAS I would use the PUT() function with the $HEX. format to convert a string into its hex equivalent.  I do not know what statement you would use on DB2 to do that.  So now once you have your now 512 character string into a SAS variable you can use the $HEX informat to convert it back to its original binary value.  So using the same 14 byte sequence this means that in DB2 you would convert the 14 byte string 'C200000008158CD500000003216C'x to the 28 byte string "C200000008158CD500000003216C".  Let's call this new variable HEXFIELD.   This would then be copied into SAS without any issue as it will only be using 16 characters.  Then in SAS you could re-create FIELD by using ( field = input(hexfield,$hex28.) .  There is no need to reference any translation functions or arrays as you now have your actual binary strings back and can process them.  You might have some issue if your values are variable length.  But you can probably easily deal with that by either padding properly or also copying over a variable with the length.

Frequent Contributor
Posts: 124

Re: $ebcdic missing 8 bytes with locale off

Tom, thank you so much for putting so much effort into understanding the issue.  I'm not sure if there is any transcoding going on in the odbc import, but you are exactly correct in your paragraph starting with 'So let's assume that it is NOT the case'.  After importing, when I put the field into $hex format, I get the ascii string '420000000000F04E000000038125', and, even though I get errors when I try to put it to ebcdic hex, it is translated from the binary to ascii hex 100% correctly 100% of the time.  It seems odd that it cannot make the round trip conversion, but so it is.  Using your example field name of 'FIELD' in the DB2 table, the DB2 equivalent of $hex is the SQL function HEX().  In my odbc sql, I can do 'SELECT HEX(FIELD) FROM TABLE', and I get the correct ebcdic hex string 100% of the time.  I am going with your suggestion that the transcoding process may not be reversible, and I will have to live with my array solution, or use the HEX() function in my sql.  You've given me lots to think about, and if I make any progress, I'll post back to this thread.  Thank you again.


Greg

Solution
‎04-23-2014 06:44 PM
Super User
Super User
Posts: 7,042

Re: $ebcdic missing 8 bytes with locale off

I do not think there is any need for you to worry about translation if you can get DB2 to convert to HEX and then convert back from HEX to binary.

So if you perform this query.

proc sql ;

connect to odbc (.....);

create table sasdata as

select input(hexfield,$hex512.) as field

from connection to odbc

(select hex(field) as hexfield from .... )

;

quit;

Then you should have the data back as it was originally .

Then you could process the first 7 bytes using:

data want ;

  set sasdata;

  category = input(substr(field,1,1),$ebcdic1.);

  value = input(substr(field,2,6),s370fpd6.2);

run;

Frequent Contributor
Posts: 124

Re: $ebcdic missing 8 bytes with locale off

Tom, thank you so much, that works perfectly.  I'll have to take some time to understand why it works when $ebcdic was mapping to wrong code points the way I was using it.  Anyhow, this makes life much easier.  Thank you so much for taking your good time and knowledge to stick with us foundlings on the forum, I have learned all I know about sas (which isn't much) from folks like you here.

Greg

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 22 replies
  • 695 views
  • 6 likes
  • 5 in conversation