BookmarkSubscribeRSS Feed
JonathanNitzan
Calcite | Level 5

Greetings:

 

(I'm using SAS 94 TS Level 1M3 on X64_8Home.)

 

I’m trying to include Hebrew text in an Annotate data set called ‘panel1’ below:

 

***

data panel1;                                                                                                                          

length function color $ 8 STYLE $ 18 text $ 100;                                                                                 

hsys='1'; xsys='1'; ysys='1';                                                                                            

label(2,98,'05d005d705d505d605d905dd'x,black,0,0,3.2,'Arial/Unicode/italic’,6);

run;

***

 

When included in Proc GPLOT the label statement will generate the following italicized Hebrew string:

 

אחוזים

 

The problem is that, when I have a lot of text, this method is very labour intensive. Is there a simple routine I can use in which I type the Hebrew string in the program and the routine converts it to Unicode?

 

Thank you.

38 REPLIES 38
Shmuel
Garnet | Level 18

I have done some work about it in the past. The attached program may give you a hint.

Do you need translation of a short strings (messages, labels etc.) and / or a mass of text given in a .txt file. Let me some more details of your needs and I will be happy to help.

data heb_encode(keep= ot str);
  retain i 1;
  heb = 'אבגדהוזחטיכךלמםנןסעפףצץקרשת';
  infile cards dlm='!';
  input str $6.;
  ot = substr(heb,i,2);
  i+2;
  output;
return;
cards;
=D7=90
=D7=91
=D7=92
=D7=93
=D7=94
=D7=95
=D7=96
=D7=97
=D7=98
=D7=99
=D7=9B
=D7=9A
=D7=9C
=D7=9E
=D7=9D
=D7=A0
=D7=9F
=D7=A1
=D7=A2
=D7=A4
=D7=A3
=D7=A6
=D7=A5
=D7=A7
=D7=A8
=D7=A9
=D7=AA
; run;

data cntl;
 set heb_encode;
     retain fmtname '$hebcvf';
     rename ot = start str=label;
run;
proc format lib=mydata cntlin=cntl; run;


data cntl;
 set heb_encode;
     retain fmtname '$cvfheb';
     rename str = start ot=label;
run;
proc format lib=mydata cntlin=cntl; run;

I believe I can adapt above code to most of your needs.

 

JonathanNitzan
Calcite | Level 5

Thank you Shmuel for the very quick reply and offer of help. To your question:

 

1. I need to convert short text lines only (to be used as series names, short descriptions of chart items, etc.).

 

2. If possible, I would like to type the Hebrew text strings in the program itself and have your routine translate them to Unicode to be included in the Annotate data set.

 

Jonathan

 

 

Shmuel
Garnet | Level 18

I have started to develop translation code but need some more information:

 

1)  Do I understand correctly - you need some macro code to make the translation inside

     sas statements, which means that running a data step will be not available?

For example: 

title "... %heb2unicode(אחוזים) ... ";

2) You supplied next line in your first post:

label(2,98,'05d005d705d505d605d905dd'x,black,0,0,3.2,'Arial/Unicode/italic’,6);

  please check:

   - would it work correctly with double quotes instead single quotes ?

   - would it work with code like:

%let ahuzim = 05d005d705d505d605d905dd;
label(2,98,"&ahuzim"x,black,0,0,3.2,'Arial/Unicode/italic’,6);

  If positive, my idea is to create a macro that generates macro variables with the unicode hex value you need.

 

3) Ascii code is one byte per character.   

    UTF8 Hebrew code is two-bytes per letter.

    As much as I know 'א' is 'D790'x. I don't see such combination in you label line;

    Can you check each encoded Hebrew letter what is its hex combination?

     Try by translating - using your method - the next Hebrew string:

      א ב ג ק ר ש ת

     

Shmuel
Garnet | Level 18
Pay attention:
א = 'D790'x is correct for windows (64bit OS)
JonathanNitzan
Calcite | Level 5

Thank you Shmuel. Here are my answers.

 

1. Yes, I'd like to input the Hebrew letters in the SAS code, similarly to the example you indicate. I'm not sure what you mean by 'running a data step will be not available'.

 

2. The label statement works correctly with double quotes (" "). Your second line of code returned the following three error messages:

ERROR: Undeclared array referenced: label.

ERROR 22-322: Syntax error, expecting one of the following: +, =.

ERROR 76-322: Syntax error, statement will be ignored.

 

3. For the Hebrew code I'm using https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=&ved=2ahUKEwjyluKc0dXtAhVNGlkFHWlvCx0Q...

Shmuel
Garnet | Level 18

I see that you type Hebrew letters.

Please run next code by retyping the Hebrew letters:

data test_heb;
   length ch $2 cx $4;
    ch = "א";  
    cx = put(ch,$hex4.);
    put ch= cx=;

    ch = "ת";  
    cx = put(ch,$hex4.);
    put ch= cx=;
run;   

and please post the log of that run;

That will let me know your environment  Hebrew hex range.

JonathanNitzan
Calcite | Level 5

I ran your code and this is the resulting log:

1131  data test_heb;
1132     length ch $2 cx $4;
1133      ch = "?";
1134      cx = put(ch,$hex4.);
1135      put ch= cx=;
1136
1137      ch = "?";
1138      cx = put(ch,$hex4.);
1139      put ch= cx=;
1140  run;

ch=? cx=3F20
ch=? cx=3F20
Shmuel
Garnet | Level 18

1) Technically it is not possible to run a new data step inside a current data step:

data step1;
 set anydata;
      .... sas statements ...
           data _null_;
               ... any helpful code ...
           run;
      ... more sas statements ...
run;

2) Your answer:

    "The label statement works correctly with double quotes (" ").
    Your second line of code returned the following three error
messages:"

    The reason to the error messages - You cannot enter %LET inside a PROC.

    Next time, in case of error, it will be helpful to get the full step log, not just

    the messages.

    Please run the code as in next template:

%let ahuzim = ..... ;   
proc gplot ... ;
    ... statements ...
     label(2,98,"&ahuzim"x,black,0,0,3.2,'Arial/Unicode/italic’,6);
    ... more statements ...
run;
   

3) Excellent, now I have the encoding for translate output you need.

 

4) Did you re-typed the Hebrew letters using the keyboard or just copied the code

     that I posted, just as is? Do you use Hebrew keyboard to type Hebrew letters? 

     It is impossible that different letters will show the same hexadecimal value

      cx = 3f20

     

 

Shmuel
Garnet | Level 18

1) that's true for PROC too - you can't use data step inside a PROC step like:
PROG GPLOT;
...... data _NULL_; ... any helpful code ...; run; ..
run;

JonathanNitzan
Calcite | Level 5

Shmuel:

 

1. For the two of us to be on the same page, I enclose here (1) the sas program file, (2) the sas dataset, and (3) the PDF version of the SVG output file (SVG format cannot be uploaded here).

2. Note that the enclosed program successfully uses your %let ahuzim macro.
3. Regarding the test_heb dataset below, I did type in the Hebrew letters as you asked, and the results are shown below.

 

Thank you,

Jonathan

data test_heb;
   length ch $2 cx $4;
    ch = "א";  
    cx = put(ch,$hex4.);
    put ch= cx=;

    ch = "ת";  
    cx = put(ch,$hex4.);
    put ch= cx=;
run;   
3735  data test_heb;
3736     length ch $2 cx $4;
3737      ch = "?";
3738      cx = put(ch,$hex4.);
3739      put ch= cx=;
3740
3741      ch = "?";
3742      cx = put(ch,$hex4.);
3743      put ch= cx=;
3744  run;

ch=? cx=3F20
ch=? cx=3F20
NOTE: The data set WORK.TEST_HEB has 1 observations and 2 variables.
NOTE: DATA statement used (Total process time):
      real time           0.01 seconds
      cpu time            0.01 seconds

 

Shmuel
Garnet | Level 18

I workd on two different methods.

I tried to develop macro programs but was stucked.

 

Relating to the log - I cannot understand why you got ch=? (instead ch=א / ch=ת);

I ran the same code and expected to see:

1          OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
 72         
 73         data test_heb;
 74            length ch $2 cx $4;
 75             ch = "א";
 76             cx = put(ch,$hex4.);
 77             put ch= cx=;
 78         
 79             ch = "ת";
 80             cx = put(ch,$hex4.);
 81             put ch= cx=;
 82         run;
 
 ch=א cx=D790
 ch=ת cx=D7AA

Finally I developed next code which may solve your problem.

The program creates macro variables with names you assign,

so you cane use them again and again in any graph you generate.

You can %include  the program to your autoexec.sas and all macro variables

will be available to you thru the sas session.

Other benefits are:

- easy to add strings to translate

- less coding in graph developing.

 

Please try next code and don't hesitate to post if any issue.

Issues that can be:

1) special characters that need be translated

2) order of substrings (such as: xxx 2 instead 2 xxx ?! )

 

/* tarnslate hebrew text to UNICODE */
%let kbd_aleph = א;    /* RETYPE with local keyboard */
%let uni_aleph = 1488; /* ALEF uncode = '05D0'x */
%let ch_len = 2;

data Heb_Table;
  length string_in string_out $130     /* Hebrew 2 bytes/char include spaces */
         aleph $2;
  retain aleph "&kbd_aleph" delta;
  if _N_=1 then do;
     delta = rank(substr(aleph,1,1))*256 + rank(substr(aleph,2,1)) - &uni_aleph;
     put DELTA=;
  end;
  
  infile datalines truncover dlm='09'x; /* use TAB as delimter betweem varname and string */
  input varname $ string_in :$char120.;
  
  len = length(string_in);
  string_out = '';
  
  i=1;
  do until(i ge length(string_in));
     if substr(string_in,i,1) = substr(aleph,1,1) 
        then by=2;
        else by=1;
     char = substr(string_in,i,by);
     if by=2 then do;
        cn = rank(substr(char,1,1))*256 + rank(substr(char,2,1)) - delta; /* &uni_aleph; */
        ch = put(byte(int(cn/256)) || byte(int(cn-int(cn/256)*256)), $hex4.);
     end;
     else ch = put(strip(char), $hex2.);
     
     string_out = cats(string_out,ch);
     i = i + by;
  end;
  call symput(varname,strip(string_out));
  keep varname string_out;
datalines;     /*** use TAB between VARNAME and the string to translate ***/
ahuzim	אחוזים
std1	+ 2 סטיות תקן
std2	+ סטית תקן
std3	- סטית תקן
std4	- 2 סטיות תקן
avg		ממוצע
mikra	מקרא
month	חודש
total	סה"כ
;
run;
 
%put ahuzim = &ahuzim;
%put std1 = &std1;

created macro variables as displayed in the log:

 121        run;
 122        
 123        %put ahuzim = &ahuzim;
 ahuzim = 05D005D705D505D605D905DD
 124        %put std1 = &std1;
 std1 = 2B3205E105D805D905D505EA05EA05E705DF
 125        
 

 

 

 

 

 

Shmuel
Garnet | Level 18

@EyalGonen , I have copied a part of @JonathanNitzan 's post, and it seems strange to me.

Why a typed character, printable in the program editor, was replaced by a question mark '?' when displayed in the log?

The program:

data test_heb;
   length ch $2 cx $4;
    ch = "א";  
    cx = put(ch,$hex4.);
    put ch= cx=;

    ch = "ת";  
    cx = put(ch,$hex4.);
    put ch= cx=;
run;  

The log:

3735  data test_heb;
3736     length ch $2 cx $4;
3737      ch = "?";
3738      cx = put(ch,$hex4.);
3739      put ch= cx=;
3740
3741      ch = "?";
3742      cx = put(ch,$hex4.);
3743      put ch= cx=;
3744  run;

ch=? cx=3F20
ch=? cx=3F20
NOTE: The data set WORK.TEST_HEB has 1 observations and 2 variables.
NOTE: DATA statement used (Total process time):
      real time           0.01 seconds
      cpu time            0.01 seconds

Is this a result of encoding issue? or is it a result of some other system option?

 

 

 

 

EyalGonen
Lapis Lazuli | Level 10

Hi @JonathanNitzan 

 

Check out the code below. You may also note there is a "SAS Users in Israel" community for Hebrew related questions.

 

data _null_;
    win1255name = "אבג דהו";
    put win1255name $hex20.;
    /* convert to Hebrew DOS */
    pcoemname = kcvt(win1255name,"pcoem862");
    put pcoemname $hex20.;
    /* convert to UTF8 */
    utfname = kcvt(win1255name,"utf8");
    put utfname $hex20.;
    /* convert to Unicode NCR */
    utf8ncr = unicodec(win1255name,"NCR");
    put utf8ncr ;
    /* convert to Unicode ESC */
    utf8esc = unicodec(win1255name,"ESC");
    put utf8esc ;
    /* convert back to Hebrew using unicode or kcvt */
    win1255name2 = unicode(utfname,"utf8");
    put win1255name2 $hex20.;
run;

E0E1E220E3E4E5                                 <- Win 1255 Hebrew
80818220838485202020                           <- DOS Hebrew
D790D791D79220D793D7                           <- UTF8 Hebrew
E0E1E220E3E4E5202020                           <- Win 1255 Hebrew
&#1488;&#1489;&#1490; &#1491;&#1492;&#1493;    <- UTF8 NCR
\u05D0\u05D1\u05D2 \u05D3\u05D4\u05D5          <- UTF8 ESC
E0E1E220E3E4E5202020
Shmuel
Garnet | Level 18

@JonathanNitzan , In case a string in the form of ENCODEC function output, like 

92  %put Ahuzim=&ahuzim;
 Ahuzim=\u05D0\u05D7\u05D5\u05D6\u05D9\u05DD

is acceptable by the %label macro you use, it simplifies the code and the maintenance is by SAS International co. Adapted code should be:

/*=====================================*/
/* Eyal Gonen - using encodec function */ 

data test;
  length strin: $80;
  infile datalines truncover dlm='09'x; /* use TAB as delimter betweem varname and string */
  input varname $ string_in :$char80.;
  string_out = unicodec(string_in);
  call symput(varname,strip(string_out));
  keep varname string_out;
datalines;     /*** use TAB between VARNAME and the string to translate ***/
ahuzim	אחוזים
std1	+ 2 סטיות תקן
std2	+ סטית תקן
std3	- סטית תקן
std4	- 2 סטיות תקן
avg		ממוצע
mikra	מקרא
month	חודש
total	סה"כ
;
run;
%put Ahuzim=&ahuzim; 

 

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 38 replies
  • 2644 views
  • 1 like
  • 4 in conversation