Re: Is there a routine to converte Hebrew text to Unicode in an Annota...

JonathanNitzan · Posted 01-14-2021 01:40 PM

Greetings:

(I'm using SAS 94 TS Level 1M3 on X64_8Home.)

I’m trying to include Hebrew text in an Annotate data set called ‘panel1’ below:

***

data panel1;

length function color $ 8 STYLE $ 18 text $ 100;

hsys='1'; xsys='1'; ysys='1';

label(2,98,'05d005d705d505d605d905dd'x,black,0,0,3.2,'Arial/Unicode/italic’,6);

run;

***

When included in Proc GPLOT the label statement will generate the following italicized Hebrew string:

אחוזים

The problem is that, when I have a lot of text, this method is very labour intensive. Is there a simple routine I can use in which I type the Hebrew string in the program and the routine converts it to Unicode?

Thank you.

Shmuel · Posted 01-14-2021 03:35 PM

I have done some work about it in the past. The attached program may give you a hint.

Do you need translation of a short strings (messages, labels etc.) and / or a mass of text given in a .txt file. Let me some more details of your needs and I will be happy to help.

data heb_encode(keep= ot str);
  retain i 1;
  heb = 'אבגדהוזחטיכךלמםנןסעפףצץקרשת';
  infile cards dlm='!';
  input str $6.;
  ot = substr(heb,i,2);
  i+2;
  output;
return;
cards;
=D7=90
=D7=91
=D7=92
=D7=93
=D7=94
=D7=95
=D7=96
=D7=97
=D7=98
=D7=99
=D7=9B
=D7=9A
=D7=9C
=D7=9E
=D7=9D
=D7=A0
=D7=9F
=D7=A1
=D7=A2
=D7=A4
=D7=A3
=D7=A6
=D7=A5
=D7=A7
=D7=A8
=D7=A9
=D7=AA
; run;

data cntl;
 set heb_encode;
     retain fmtname '$hebcvf';
     rename ot = start str=label;
run;
proc format lib=mydata cntlin=cntl; run;


data cntl;
 set heb_encode;
     retain fmtname '$cvfheb';
     rename str = start ot=label;
run;
proc format lib=mydata cntlin=cntl; run;

I believe I can adapt above code to most of your needs.

JonathanNitzan · Posted 01-14-2021 04:42 PM

Thank you Shmuel for the very quick reply and offer of help. To your question:

1. I need to convert short text lines only (to be used as series names, short descriptions of chart items, etc.).

2. If possible, I would like to type the Hebrew text strings in the program itself and have your routine translate them to Unicode to be included in the Annotate data set.

Jonathan

Shmuel · Posted 01-15-2021 12:13 PM

I have started to develop translation code but need some more information:

1) Do I understand correctly - you need some macro code to make the translation inside

sas statements, which means that running a data step will be not available?

For example:

title "... %heb2unicode(אחוזים) ... ";

2) You supplied next line in your first post:

label(2,98,'05d005d705d505d605d905dd'x,black,0,0,3.2,'Arial/Unicode/italic’,6);

please check:

- would it work correctly with double quotes instead single quotes ?

- would it work with code like:

%let ahuzim = 05d005d705d505d605d905dd;
label(2,98,"&ahuzim"x,black,0,0,3.2,'Arial/Unicode/italic’,6);

If positive, my idea is to create a macro that generates macro variables with the unicode hex value you need.

3) Ascii code is one byte per character.

UTF8 Hebrew code is two-bytes per letter.

As much as I know 'א' is 'D790'x. I don't see such combination in you label line;

Can you check each encoded Hebrew letter what is its hex combination?

Try by translating - using your method - the next Hebrew string:

א ב ג ק ר ש ת

Shmuel · Posted 01-15-2021 02:39 PM

Pay attention:
א = 'D790'x is correct for windows (64bit OS)

JonathanNitzan · Posted 01-15-2021 05:33 PM

Thank you Shmuel. Here are my answers.

1. Yes, I'd like to input the Hebrew letters in the SAS code, similarly to the example you indicate. I'm not sure what you mean by 'running a data step will be not available'.

2. The label statement works correctly with double quotes (" "). Your second line of code returned the following three error messages:

ERROR: Undeclared array referenced: label.

ERROR 22-322: Syntax error, expecting one of the following: +, =.

ERROR 76-322: Syntax error, statement will be ignored.

3. For the Hebrew code I'm using https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=&ved=2ahUKEwjyluKc0dXtAhVNGlkFHWlvCx0Q...

Shmuel · Posted 01-15-2021 04:24 PM

I see that you type Hebrew letters.

Please run next code by retyping the Hebrew letters:

data test_heb;
   length ch $2 cx $4;
    ch = "א";  
    cx = put(ch,$hex4.);
    put ch= cx=;

    ch = "ת";  
    cx = put(ch,$hex4.);
    put ch= cx=;
run;

and please post the log of that run;

That will let me know your environment Hebrew hex range.

JonathanNitzan · Posted 01-15-2021 05:37 PM

I ran your code and this is the resulting log:

1131  data test_heb;
1132     length ch $2 cx $4;
1133      ch = "?";
1134      cx = put(ch,$hex4.);
1135      put ch= cx=;
1136
1137      ch = "?";
1138      cx = put(ch,$hex4.);
1139      put ch= cx=;
1140  run;

ch=? cx=3F20
ch=? cx=3F20

Shmuel · Posted 01-15-2021 09:22 PM

1) Technically it is not possible to run a new data step inside a current data step:

data step1;
 set anydata;
      .... sas statements ...
           data _null_;
               ... any helpful code ...
           run;
      ... more sas statements ...
run;

2) Your answer:

"The label statement works correctly with double quotes (" ").
Your second line of code returned the following three error messages:"

The reason to the error messages - You cannot enter %LET inside a PROC.

Next time, in case of error, it will be helpful to get the full step log, not just

the messages.

Please run the code as in next template:

%let ahuzim = ..... ;   
proc gplot ... ;
    ... statements ...
     label(2,98,"&ahuzim"x,black,0,0,3.2,'Arial/Unicode/italic’,6);
    ... more statements ...
run;

3) Excellent, now I have the encoding for translate output you need.

4) Did you re-typed the Hebrew letters using the keyboard or just copied the code

that I posted, just as is? Do you use Hebrew keyboard to type Hebrew letters?

It is impossible that different letters will show the same hexadecimal value

cx = 3f20

Shmuel · Posted 01-15-2021 09:26 PM

1) that's true for PROC too - you can't use data step inside a PROC step like:
PROG GPLOT;
...... data _NULL_; ... any helpful code ...; run; ..
run;

JonathanNitzan · Posted 01-16-2021 01:36 PM

Shmuel:

1. For the two of us to be on the same page, I enclose here (1) the sas program file, (2) the sas dataset, and (3) the PDF version of the SVG output file (SVG format cannot be uploaded here).

2. Note that the enclosed program successfully uses your %let ahuzim macro.
3. Regarding the test_heb dataset below, I did type in the Hebrew letters as you asked, and the results are shown below.

Thank you,

Jonathan

data test_heb;
   length ch $2 cx $4;
    ch = "א";  
    cx = put(ch,$hex4.);
    put ch= cx=;

    ch = "ת";  
    cx = put(ch,$hex4.);
    put ch= cx=;
run;

3735  data test_heb;
3736     length ch $2 cx $4;
3737      ch = "?";
3738      cx = put(ch,$hex4.);
3739      put ch= cx=;
3740
3741      ch = "?";
3742      cx = put(ch,$hex4.);
3743      put ch= cx=;
3744  run;

ch=? cx=3F20
ch=? cx=3F20
NOTE: The data set WORK.TEST_HEB has 1 observations and 2 variables.
NOTE: DATA statement used (Total process time):
      real time           0.01 seconds
      cpu time            0.01 seconds

Shmuel · Posted 01-16-2021 04:55 PM

I workd on two different methods.

I tried to develop macro programs but was stucked.

Relating to the log - I cannot understand why you got ch=? (instead ch=א / ch=ת);

I ran the same code and expected to see:

1          OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
 72         
 73         data test_heb;
 74            length ch $2 cx $4;
 75             ch = "א";
 76             cx = put(ch,$hex4.);
 77             put ch= cx=;
 78         
 79             ch = "ת";
 80             cx = put(ch,$hex4.);
 81             put ch= cx=;
 82         run;
 
 ch=א cx=D790
 ch=ת cx=D7AA

Finally I developed next code which may solve your problem.

The program creates macro variables with names you assign,

so you cane use them again and again in any graph you generate.

You can %include the program to your autoexec.sas and all macro variables

will be available to you thru the sas session.

Other benefits are:

- easy to add strings to translate

- less coding in graph developing.

Please try next code and don't hesitate to post if any issue.

Issues that can be:

1) special characters that need be translated

2) order of substrings (such as: xxx 2 instead 2 xxx ?! )

/* tarnslate hebrew text to UNICODE */
%let kbd_aleph = א;    /* RETYPE with local keyboard */
%let uni_aleph = 1488; /* ALEF uncode = '05D0'x */
%let ch_len = 2;

data Heb_Table;
  length string_in string_out $130     /* Hebrew 2 bytes/char include spaces */
         aleph $2;
  retain aleph "&kbd_aleph" delta;
  if _N_=1 then do;
     delta = rank(substr(aleph,1,1))*256 + rank(substr(aleph,2,1)) - &uni_aleph;
     put DELTA=;
  end;
  
  infile datalines truncover dlm='09'x; /* use TAB as delimter betweem varname and string */
  input varname $ string_in :$char120.;
  
  len = length(string_in);
  string_out = '';
  
  i=1;
  do until(i ge length(string_in));
     if substr(string_in,i,1) = substr(aleph,1,1) 
        then by=2;
        else by=1;
     char = substr(string_in,i,by);
     if by=2 then do;
        cn = rank(substr(char,1,1))*256 + rank(substr(char,2,1)) - delta; /* &uni_aleph; */
        ch = put(byte(int(cn/256)) || byte(int(cn-int(cn/256)*256)), $hex4.);
     end;
     else ch = put(strip(char), $hex2.);
     
     string_out = cats(string_out,ch);
     i = i + by;
  end;
  call symput(varname,strip(string_out));
  keep varname string_out;
datalines;     /*** use TAB between VARNAME and the string to translate ***/
ahuzim	אחוזים
std1	+ 2 סטיות תקן
std2	+ סטית תקן
std3	- סטית תקן
std4	- 2 סטיות תקן
avg		ממוצע
mikra	מקרא
month	חודש
total	סה"כ
;
run;
 
%put ahuzim = &ahuzim;
%put std1 = &std1;

created macro variables as displayed in the log:

 121        run;
 122        
 123        %put ahuzim = &ahuzim;
 ahuzim = 05D005D705D505D605D905DD
 124        %put std1 = &std1;
 std1 = 2B3205E105D805D905D505EA05EA05E705DF
 125

Shmuel · Posted 01-19-2021 11:01 AM

@EyalGonen , I have copied a part of @JonathanNitzan 's post, and it seems strange to me.

Why a typed character, printable in the program editor, was replaced by a question mark '?' when displayed in the log?

The program:

data test_heb;
   length ch $2 cx $4;
    ch = "א";  
    cx = put(ch,$hex4.);
    put ch= cx=;

    ch = "ת";  
    cx = put(ch,$hex4.);
    put ch= cx=;
run;

The log:

3735  data test_heb;
3736     length ch $2 cx $4;
3737      ch = "?";
3738      cx = put(ch,$hex4.);
3739      put ch= cx=;
3740
3741      ch = "?";
3742      cx = put(ch,$hex4.);
3743      put ch= cx=;
3744  run;

ch=? cx=3F20
ch=? cx=3F20
NOTE: The data set WORK.TEST_HEB has 1 observations and 2 variables.
NOTE: DATA statement used (Total process time):
      real time           0.01 seconds
      cpu time            0.01 seconds

Is this a result of encoding issue? or is it a result of some other system option?

EyalGonen · Posted 01-17-2021 02:37 AM

Hi @JonathanNitzan

Check out the code below. You may also note there is a "SAS Users in Israel" community for Hebrew related questions.

data _null_;
    win1255name = "אבג דהו";
    put win1255name $hex20.;
    /* convert to Hebrew DOS */
    pcoemname = kcvt(win1255name,"pcoem862");
    put pcoemname $hex20.;
    /* convert to UTF8 */
    utfname = kcvt(win1255name,"utf8");
    put utfname $hex20.;
    /* convert to Unicode NCR */
    utf8ncr = unicodec(win1255name,"NCR");
    put utf8ncr ;
    /* convert to Unicode ESC */
    utf8esc = unicodec(win1255name,"ESC");
    put utf8esc ;
    /* convert back to Hebrew using unicode or kcvt */
    win1255name2 = unicode(utfname,"utf8");
    put win1255name2 $hex20.;
run;

E0E1E220E3E4E5                                 <- Win 1255 Hebrew
80818220838485202020                           <- DOS Hebrew
D790D791D79220D793D7                           <- UTF8 Hebrew
E0E1E220E3E4E5202020                           <- Win 1255 Hebrew
&#1488;&#1489;&#1490; &#1491;&#1492;&#1493;    <- UTF8 NCR
\u05D0\u05D1\u05D2 \u05D3\u05D4\u05D5          <- UTF8 ESC
E0E1E220E3E4E5202020

Shmuel · Posted 01-17-2021 03:16 AM

@JonathanNitzan , In case a string in the form of ENCODEC function output, like

92  %put Ahuzim=&ahuzim;
 Ahuzim=\u05D0\u05D7\u05D5\u05D6\u05D9\u05DD

is acceptable by the %label macro you use, it simplifies the code and the maintenance is by SAS International co. Adapted code should be:

/*=====================================*/
/* Eyal Gonen - using encodec function */ 

data test;
  length strin: $80;
  infile datalines truncover dlm='09'x; /* use TAB as delimter betweem varname and string */
  input varname $ string_in :$char80.;
  string_out = unicodec(string_in);
  call symput(varname,strip(string_out));
  keep varname string_out;
datalines;     /*** use TAB between VARNAME and the string to translate ***/
ahuzim	אחוזים
std1	+ 2 סטיות תקן
std2	+ סטית תקן
std3	- סטית תקן
std4	- 2 סטיות תקן
avg		ממוצע
mikra	מקרא
month	חודש
total	סה"כ
;
run;
%put Ahuzim=&ahuzim;

Is there a routine to converte Hebrew text to Unicode in an Annotate data set?

Re: Is there a routine to converte Hebrew text to Unicode in an Annotate data set?

Re: Is there a routine to converte Hebrew text to Unicode in an Annotate data set?

Re: Is there a routine to converte Hebrew text to Unicode in an Annotate data set?

Re: Is there a routine to converte Hebrew text to Unicode in an Annotate data set?

Re: Is there a routine to converte Hebrew text to Unicode in an Annotate data set?

Re: Is there a routine to converte Hebrew text to Unicode in an Annotate data set?

Re: Is there a routine to converte Hebrew text to Unicode in an Annotate data set?

Re: Is there a routine to converte Hebrew text to Unicode in an Annotate data set?

Re: Is there a routine to converte Hebrew text to Unicode in an Annotate data set?

Re: Is there a routine to convert Hebrew text to Unicode in an Annotate data set?

Re: Is there a routine to convert Hebrew text to Unicode in an Annotate data set?

Re: Is there a routine to convert Hebrew text to Unicode in an Annotate data set?

Re: Is there a routine to converte Hebrew text to Unicode in an Annotate data set?

Re: Is there a routine to converte Hebrew text to Unicode in an Annotate data set?

Registration is open

SAS Training: Just a Click Away