BookmarkSubscribeRSS Feed
gsnidow
Obsidian | Level 7

Greetings all.  This is a follow up to my last question here, https://communities.sas.com/thread/34254 where I was trying to convert between ascii and ebcdic values.  However, this is a new problem I am having, so I will leave that one alone.  As it is right now, I am able to use an array in a data step to provide the lookup values of ascii to ebcdic.  Basically I am looking at each set of 2 characters in the ascii string, and returning the ebcdic value, which is contatanated to itself to produce the full ebcdic string.  Below is the code

/* set up an accounts table to hold the ascii string.  In reality there would be millions of observations,

   and the ascii_string could be up to 512 characters in length.*/

data Accounts;

input ID ASCII_STRING $3-30;

datalines;

1 420000009D800C4400000002DF40

;

/* Now use an array to provide the ascii to ebcdic lookup values, and use the ebcdic values to build

   an ebcdic string.  In reality the array would be (2, 256), to cover all possible ascii values, but

   I have limited it here to only the ones in the ascii string.  I tested this code with the full ascii

   character set on a table with around 250,000 observations, and it took only a couple of seconds to

   execute. */

data accounts2;

set accounts;

length x $ 2;

length EBCDIC_VAL $ 512;

array ascii_ebcdic {2,9} $

  (

  '00' '02' '0C' '40' '42' '44' '80' '9D' 'DF'

  '00' '02' '0C' '7C' 'C2' 'C4' '20' '14' '59'

  );

do i = 1 to length(ascii_string)-1 by 2;

  x='';

  do j = 1 to 9; *this would be j = 1 to 256 in reality;

   if substr(ascii_string,i,2) = trim(ascii_ebcdic(1,j)) then do;

    x = trim(ascii_ebcdic(2,j));

   end;

  end;

  EBCDIC_VAL = cats(EBCDIC_VAL,x);

end;

drop  ascii_ebcdic1-ascii_ebcdic512 x i j;

run;

*see the results;

proc print data = accounts2;

quit;

Now, this works perfectly, and seems to be fast.  This is all very new to me, so I am limited to trial and error.  My problem is that I do not want to have include the array statement and loops every time I want to convert between ascii and ebcdic in a data step, so I thought I would try a user defined function, like in VBA or SQL, then just call it when I need to convert between ascii and ebcdic.  I will post the code below, and I know it does not work, but I wanted you all to know where I am with this.  I sure would appreciate any tips, or links to documents that might describe a similar situation to this one.  Thank you for your time and help so far.

/* I want the array to be a separate entity from the function, so it does not have to populate

   with each call to the function.  When I run this I can see something is created in work.funcs,

   but I don't know how to determine what it is. */

proc fcmp outlib=work.funcs.ascii_ebcdic;

subroutine ascii_ebcdic ();

array ascii_ebcdic [2,9] $ ('00' '02' '0C' '40' '42' '44' '80' '9D' 'DF'

        '00' '02' '0C' '7C' 'C2' 'C4' '20' '14' '59');

endsub;

run;

/* I want this function to be able to reference the array ascii_ebcdic, but I keep getting an error,

   'WARNING: Cannot find a library containing subroutine ASCII_EBCDIC'.  I don't know if this is

   because my array was not created, or because my syntax is not correct.  I looked at several

   documents dealing with functions, but I could not find one specific to this need. */

  

proc fcmp outlib=work.funcs.ascii_ebcdic;

function AsciiToEbcdic(ascii_string) $;

length x $ 2;

length ebcdic_string $ 28;

do i = 1 to length(ascii_string)-1 by 2;

  x='';

  do j = 1 to 9;

   if substr(ascii_string,i,2) = trim(ascii_ebcdic(1,j)) then do;

    x = trim(ascii_ebcdic(2,j));

   end;

  end;

  ebcdic_string = cats(ebcdic_string,x);

end;

return(ebcdic_string);

endsub;

run;

26 REPLIES 26
Tom
Super User Tom
Super User

What are you trying to do?

Why are you not just using built in SAS formats and informats like $HEX and $EBCDIC?

Why are you using an ARRAY instead of just a character variable?

do i = 1 to length(ascii_string)-1 by 2;

  found=0;

  do j = 1 to 18 by 2 while (not found);

   if substr(ascii_string,i,2) = substr('00020C404244809DDF',j,2) then found=j;

  end;

  if found then ebcdic_string = cats(ebcdic_string,substr('00020C7CC2C4201459',found,2));

end;


Why not use other SAS functions like TRANWRD()?

gsnidow
Obsidian | Level 7

Tom, thank you for your reply so quickly.  The problem I am having is that I am getting data from a DB2 database where the column is packed decimal.  To your point, and what I did not show is that I am using $HEX to get the column into a character string.  The issue is that since my machine has ascii as its native character set, the $HEX format produces an ascii string, where I need it to be an ebcdic string, as I would see when using the HEX() function in DB2 SQL.

SAS is all new to me, as in I used it for the first time about 3 or 4 weeks ago, so I don't have a working knowledge of its functions and syntax.  At this point I am 100% trial and error, so when I get something to work, I don't know of a better way.  I do know that there probably is a better way, I just don't know how to get there.

Greg

Tom
Super User Tom
Super User

Use the appropriate S370 format.

For example here is the S370FPD format.

data test;

  input x;

  pd = put(x,s370fpd4.);

  y = input(pd,s370fpd4.);

  put x= pd=$hex8. +1 y=;

cards;

16

78

run;

x=16 pd=0000016C  y=16

x=78 pd=0000078C  y=78

gsnidow
Obsidian | Level 7

Tom, I'll have to spend some time trying to understand what that is doing.  I can't see how I can use that to turn the ascii_string 4200000010900C4D0000000017F04E00000000F12A into the ebcdic_string C200000010300CD400000000268CD500000000495C.  For the ascii string here, I need to look at each set of 2 characters, stepping by 2 and get its corresponding ebcdic value.  For example, I need the code to start at position 1, and take '42' and look at the lookup array to return 'C2'.  Then it needs to go to position 3 of the ascii string, and take '00' and return the ebcdic value of '00', and append it to the previous 'C2', giving me 'C200'.  I need to repeat this look up process until every ascii value in the ascii string has been translated into ebcdic, giving me the final ebcdic string.  If there is a way to do this without using a lookup array, or permanent dataset for that matter, then I am just not seeing it so far.  Thanks.

Tom
Super User Tom
Super User

If you had moved the file from the mainframe to your ASCII machine using Binary then you should be able to read the original values using the S370.... formats.  Did you move the file from the mainframe as ASCII instead binary and you are trying to undo the damage?

You need to know the location in the string of the Packed Decimal values you are trying to read.

How many numbers does the example 21 character string represent?

Tom
Super User Tom
Super User

Why not just use the TRANSLATE function.  Convert your hex digits into actual characters that they represent and then translate them.

data check;

  from='00020C404244809DDF'x;

  to  ='00020C7CC2C4201459'x;

  input @1 asc $hex28. ;

  ebc=translate(asc,to,from);

  format asc ebc $hex.;

  put asc=/ebc=;

cards;

420000009D800C4400000002DF40

run;


asc=420000009D800C4400000002DF40

ebc=C200000014200CC400000002597C

Note that the first three characters are the same in both the source and target translation strings.  So you could just remove those.

  from='404244809DDF'x;

  to  ='7CC2C4201459'x;

gsnidow
Obsidian | Level 7

TRANSLATE looks very promissing, I'll try that out.  To answer your questions from your previous post, the packed decimal fields vary, and are actually an array of values.  In the example data above, each value is 14 characters long.  In some fields they are 38 characters long, it just depends on the field.  Another issue I am having, and it is probably due to my limited knowledge, the same field will vary in length across observations.  One observation might only have one value (14 characters), and the next observation might have 5 values (70 characters).

I am creating a library reference using ODBC to the mainframe so I can query the DB2 tables directly.  We have a work around that uses a passthrough query in which we can use the DB2 HEX() function on the field to give the results we need.  The down side of this is that we can not join to any local table inside the passthrough querries.  By using $HEX512. (or whatever the length of the column is, it varies), I am able to turn the packed decimals into the ascii strings I have posted above.  I have searched long and hard for a format that would give me the ebcdic string that I need, but have come up blank.  I already created a SQL Server function that deals with the issue by converting the packed decimal to binary, then uses a lookup table as I have done in the array above.  I am guessing that since SAS is on my local machine, it assumes I want to see the ascii values, when in reality I need it to behave as if ebcdic was my native character set.

I'll let you know if I have any luck with TRANSLATE.  Thank you so much for your help.

Greg

Tom
Super User Tom
Super User

Perhaps you can post a few strings and the numbers that they represent?

How do you know how many numbers?  Is there a count variable somewhere else? Or is there a termination string? Or does the string just terminate?  If the latter then the LENGTH() function should let you calculate how many 14 character hex digits your string has.

gsnidow
Obsidian | Level 7

Here are 3 examples, all from the same DB2 field.

ASCII_STRINGEBCDIC_STRING
4A000000000029D100000000005D
490000008798F04A0000000082F0C900000017388CD100000000228C
4200000019D8F0430000007FF80C4400000001DF14C200000019808CC300000007700CC400000001593C

The the ascii_string variable comes from using $HEX. on the DB2 column, the ebcdic_string column is what I produced with my array.  Lets look at the ebcdic strings one by one, since those are what I need.

1) D100000000005D

     Characters 3-13 represent a number, and I need to dived by 100 to put the decimal back.  In this case, the number would be 0.05 after stripping off the leading zeros.

     If the first two characters are in ('C3','C7','C8','','C9','D1','D2','D3'), then the entire value in 3-13 gets multiplied by 0, else 1.

     If the last character is in ('B', 'D') then the number is negative, else it is positive.

     So, for this one, the only thing that matters is the initial 'D1', since it renders the value to 0.00, and the rest is moot.

2) C900000017388CD100000000228C

     This one has two values, C900000017388C and D100000000228C.  In both cases, the leading 2 characters of 'C9' and 'D1' render it 0.00, so again the rest is moot.

3) C200000019808CC300000007700CC400000001593C

     This one has three values C200000019808C, C300000007700C, and C400000001593C.

     The first value, having characters 1-2 = 'C2' renders it a non-zero value, so the absolute value becomes 77.00, and since the last character is not in ('B','D'), it has a positive value.

     The second value, having characters 1-2 = 'C3' renders it 0.00, so the rest of it is moot.

     The third value, having characters 1-2 = 'C4', renders it a non-zero absolute value of 15.93, and the last character of 'C' makes it a possitive value.

     So, for this one, having three values, I end up with a positive +77.00, 0.00, and +15.93, which are to be summed, so the total value of the one ebcdic string is 92.93.

I know it may seem convoluted, but that is the data, so I have to deal with it.  A little SQL and a numbers table make light work of dicing it up.

And to your point, If the mainframe field is packed decimal with a length of 78, then I know I need to double that for my ebcdic string.  So I do use the length function to determine how long it needs to be.

Greg

Tom
Super User Tom
Super User

Looks to me like using the $EBCDIC format to convert your values from ASCII to EBCDIC then lets you use the S370FPD6.2 informat on the last 6 characters of each 7 character string.  But your example seems to have reversed which of the first two values in the row with three values should be considered as zero.  So the total sum is different.

data check;

  input hex $70.;

  length ascii ebcdic $7 first $2 value total 8;

  put / _n_=;

  total=0;

  do i=0 to ceil(length(hex)/14) -1 ;

    ascii = input(substr(hex,1+i*14,14),$hex14.);

    ebcdic=put(ascii,$ebcdic7.);

    first= put(ebcdic,$hex2.);

    value = input(substr(ebcdic,2),s370fpd6.2);

    put i= first= value= @;

    if first in ('C3','C7','C8','','C9','D1','D2','D3')  then do;

      value=0; put '-> ' value @;

    end;

    total=sum(total,value);

  end;

    put / total=;

cards;

4A000000000029

490000008798F04A0000000082F0

4200000019D8F0430000007FF80C4400000001DF14

run;


_N_=1

i=0 first=D1 value=-0.05 -> 0

total=0

_N_=2

i=0 first=C9 value=173.88 -> 0 i=1 first=D1 value=2.28 -> 0

total=0

_N_=3

i=0 first=C2 value=198.08 i=1 first=C3 value=77 -> 0 i=2 first=C4 value=15.93

total=214.01

gsnidow
Obsidian | Level 7

Tom

     Thank you so much for taking your time to consider my issue as you have done here.  I'm not sure why the difference, but it could be a translation error on my part.  Anyhow, after reading your solution, I spent some time reading about formats last night, and was a bit overwhelmed at how many there are.  I knew there were more than just char and number, but I had no idea how vast the universe was.  Anyhow, I'll have to take some time do digest it all, so I will post back when I fully understand your code.  Thanks again.

Greg

gsnidow
Obsidian | Level 7

Tom, if you are still willing, I seem to be doing something wrong.  I am trying to use formats to convert the ascii string as you have done, and I am having issues.  Sometimes the format works, and sometimes it does not.  If you would be so kind to look at the below and see if anything stands out to you.  Also, please bear with me with respect to the array, since I know it produces correct values, I have included it as a sanity check.  Below are two examples, on the first of which the format works, and the second of which the format does not work.  In both cases the variable ebcdic_by_array produces the correct values.

data test;

input ascii_string $14.;

length x $ 2;

length ebcdic_by_array $ 14;

array ascii_ebcdic {2,8} $

  (

  '00' '09' '1C' '26' '49' '98' 'CD' 'E6'

  '00' '05' '1C' '50' 'C9' '38' '75' '9C'

        );

do i = 1 to length(ascii_string)-1 by 2;

  x='';

  do j = 1 to 8;

   if substr(ascii_string,i,2) = trim(ascii_ebcdic(1,j)) then do;

    x = trim(ascii_ebcdic(2,j));

   end;

  end;

  ebcdic_by_array = cats(ebcdic_by_array,x);

end;

ascii = input(ascii_string,$hex14.);

ebcdic = put(ascii,$ebcdic7.);

ebcdic_by_format = put(ebcdic,$hex14.);

drop  ascii_ebcdic1-ascii_ebcdic512 x i j;

cards;

4900000009CD1C

490000009826E6

run;

Greg

Tom
Super User Tom
Super User

I am not sure how your loop can work to convert all possible 2 character HEX strings from ASCII to EBCDIC codes.  You are only attempting to convert 8 characters out of the 256 combinations possible with 8 bits.

gsnidow
Obsidian | Level 7

Indeed you are correct.  I only included the values necessary to convert the two strings I used as examples, and I should have stated as such.  If I included all 256 possible values they would have become a jumbled mess here.  In my code I do have the whole character set, and that is what I used to test my million rows.  If there is a way to attach files here I will certainly attach it if you would be interested.  Thank you.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 26 replies
  • 5671 views
  • 6 likes
  • 5 in conversation