BookmarkSubscribeRSS Feed
Calcite | Level 5

Hi Eyal:

I ran your code (after retyping the Hebrew letters) and this is the log I get:

2034  data _null_;
2035      win1255name = "??? ???";
2036      put win1255name $hex20.;
2037      /* convert to Hebrew DOS */
2038      pcoemname = kcvt(win1255name,"pcoem862");
2039      put pcoemname $hex20.;
2040      /* convert to UTF8 */
2041      utfname = kcvt(win1255name,"utf8");
2042      put utfname $hex20.;
2043      /* convert to Unicode NCR */
2044      utf8ncr = unicodec(win1255name,"NCR");
2045      put utf8ncr ;
2046      /* convert to Unicode ESC */
2047      utf8esc = unicodec(win1255name,"ESC");
2048      put utf8esc ;
2049      /* convert back to Hebrew using unicode or kcvt */
2050      win1255name2 = unicode(utfname,"utf8");
2051      put win1255name2 $hex20.;
2052  run;

??? ???
??? ???
NOTE: DATA statement used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds
Lapis Lazuli | Level 10

Hi @JonathanNitzan 


1. The code I posted assumes your SAS session is invoked with encoding=Hebrew (on Linux/Unix) or encoding=Whebrew (on Windows). You can verify by running "proc options option=encoding;run;" in your SAS session. 


2. The code I posted shows you the various SAS functions you can use to convert non-Unicode text to Unicode (UTF8). To run the code make sure you place a valid Hebrew text into the field "win1255name. In the log you sent there are question marks in this field's value instead of Hebrew characters...Correct the question marks and rerun. I hope one of the functions in the code is the one you are looking for.




Calcite | Level 5

1. I have sorted the encoding issue (with your help and SAS' customer support). The problem was that my SAS session was invoked with the Target:

"C:\Program Files\SASHome\SASFoundation\9.4\sas.exe" -CONFIG "C:\Program Files\SASHome\SASFoundation\9.4\nls\en\sasv9.cfg"

I changed the "en" to "u8", and I now have the needed "ENCODING=UTF-8".


2. As I indicated, there are so many postings now, that I find them hard to sort out. So let's start with your suggestion, Eyal:

data _null_;
    win1255name = "אבג דהו";
    put win1255name $hex20.;
    /* convert to Hebrew DOS */
    pcoemname = kcvt(win1255name,"pcoem862");
    put pcoemname $hex20.;
    /* convert to UTF8 */
    utfname = kcvt(win1255name,"utf8");
    put utfname $hex20.;
    /* convert to Unicode NCR */
    utf8ncr = unicodec(win1255name,"NCR");
    put utf8ncr ;
    /* convert to Unicode ESC */
    utf8esc = unicodec(win1255name,"ESC");
    put utf8esc ;
    /* convert back to Hebrew using unicode or kcvt */
    win1255name2 = unicode(utfname,"utf8");
    put win1255name2 $hex20.;

Having run this code, I get the following Log.

6    data _null_;
7        win1255name = "אבג דהו";
8        put win1255name $hex20.;
9        /* convert to Hebrew DOS */
10       pcoemname = kcvt(win1255name,"pcoem862");
11       put pcoemname $hex20.;
12       /* convert to UTF8 */
13       utfname = kcvt(win1255name,"utf8");
14       put utfname $hex20.;
15       /* convert to Unicode NCR */
16       utf8ncr = unicodec(win1255name,"NCR");
17       put utf8ncr ;
18       /* convert to Unicode ESC */
19       utf8esc = unicodec(win1255name,"ESC");
20       put utf8esc ;
21       /* convert back to Hebrew using unicode or kcvt */
22       win1255name2 = unicode(utfname,"utf8");
23       put win1255name2 $hex20.;
24   run;

אבג דהו
\u05D0\u05D1\u05D2 \u05D3\u05D4\u05D5

Of the 6 different options, the (almost) correct function is:

 utf8esc = unicodec(win1255name,"ESC");

Which produces:

\u05D0\u05D1\u05D2 \u05D3\u05D4\u05D5

However, this function adds (1) "\" and "u" in front of the unicodes, additions that my %label statement doesn't understand; and (2) an actual space between the two Hebrew strings instead of the unicode "0020" for a space.


3. Is there a way to "clean" this resulting string so it produces only the only the necessary unicodes? I won't be surprised if Shmuel had already solved it in his posts, but I'd appreciate having a fresh summary instead of having to sort it out myself....




Lapis Lazuli | Level 10
If you decided to switch to UTF8 encoding then you do not need my code as my code uses SAS functions to convert Hebrew or WHebrew encoded text to UTF8. So if you are now using SAS in UTF8 encoding you do not need to convert to UTF8 anymore...Whatever you type in Hebrew is already encoded in UTF8...
Calcite | Level 5

Eyal: This is the best "under your nose" line I have read for a while! It works like charm, as it should.

Shmuel: Thank you for all the work, which is truly appreciated, regardless.


Garnet | Level 18

@JonathanNitzan I'm curious to know weather, now that you run sas with encoding=UTF-8, you still need the translation from UTF-8 to UNICODE? Maybe you can just write the Hebrew string as is:


In case, translation is still needed, you can even use the uncodec() function in a macro as in next log:

73         %macro uni(str);
 74             %local x;
 75             %let x=%sysfunc(unicodec(&str));
 76             %sysfunc(compress(&x,\u));
 77         %mend;
 78         data _null_;
 79           text= "*** Testing - %uni(אחוזים) ***";
 80           put text=;
 81         run;
 text=*** Testing - 05D005D705D505D605D905DD; ***
Calcite | Level 5
There is no longer a need for translation. I can simply type in the
Hebrew text into the %label statement.

Many thanks for your help!
Garnet | Level 18

@EyalGonen , Thanks, You were very helpful.

@JonathanNitzan , You are touching the end.

I am going to summarize that thread.


1) The thread deals with two different issues:

     1.1  Encoding issue - required to change the ENCODING from WLATIN1 to UTF-8.

            You solved it by changing the SAS invocation command, from:

"C:\Program Files\SASHome\SASFoundation\9.4\sas.exe" 
  -CONFIG "C:\Program Files\SASHome\SASFoundation\9.4\nls\en\sasv9.cfg"

         by replacing en by u8.


     1.2 Converting Hebrew strings from UTF-8 to Hebrew UNICODE to be used with SAS Graphs.

           I proposed three different codes to it.

           I shall present them here again in my proposed priority:


METHOD 1 - Recomended 

data test;
  length strin: $80;
  infile datalines truncover dlm='09'x; /* use TAB as delimter betweem varname and string */
  input varname $ string_in :$char80.;
  string_out = compress(unicodec(string_in), '\u');
  call symput(varname,strip(string_out));
  keep varname string_out;
datalines;     /*** use TAB between VARNAME and the string to translate ***/
ahuzim	אחוזים
std1	+ 2 סטיות תקן
std2	+ סטית תקן
std3	- סטית תקן
std4	- 2 סטיות תקן
avg		ממוצע
mikra	מקרא
month	חודש
total	סה""כ
%put Ahuzim = &ahuzim; 
%put STD1 = &std1;
%put MIKRA = &mikra;
%put TOTAL = &total;

METHOD 2 - using format for translating:

data heb2uni; 
  retain fmtname '$h2u';
  length heb $2 uni $4;
  infile cards;
  input uni $ heb $;
05D0 א
05D1 ב
05D2 ג
05D3 ד
05D4 ה
05D5 ו
05D6 ז
05D7 ח
05D8 ט
05D9 י
05DA ך
05DB כ
05DC ל
05DD ם
05DE מ
05DF ן
05E0 נ
05E1 ס
05E2 ע
05E3 ף
05E4 פ
05E5 ץ
05E6 צ
05E7 ק
05E8 ר
05E9 ש
05EA ת
; run;
proc format lib=work
     cntlin=heb2uni(rename=(heb=start uni=label));

data HebTable;
   length hebstr  unistr $80 ch $4;
   infile cards dlm='09'x truncover;
   input varname $ hebstr $;
   unistr = ''; hebstr = strip(hebstr);
   do i=1 to length(hebstr) by 2;
      ch = put(substr(hebstr,i,2), $h2u.);
      unistr = cats(unistr,ch);
   keep varname hebstr unistr;
   putlog varname=  hebstr= unistr=;
   call symput(varname,strip(unistr));
ahuzim	אחוזים
std1	+ 2 סטיות תקן
std2	+ סטית תקן
std3	- סטית תקן
std4	- 2 סטיות תקן
avg		ממוצע
mikra	מקרא
month	חודש
total	סה""כ
%put AHUZIM = &ahuzim;
%put STD1 = &std1;
%put AVG = &avg;
%put TOTAL = &total;

/*** usage example ***/
data _null_;
  txt1 = "(xAHUZIM= &ahuzim xxx)";
  put txt1=;
  txt2 = "(xTOTAL= &total xxx)";
  put txt2=;


METHOD 3 - Doing the hard work:

/* tarnslate hebrew text to UNICODE */
/* SEE ALSO: targilim/ as alternatives */

%let kbd_aleph = א;    /* RETYPE with local keyboard */
%let uni_aleph = 1488; /* ALEF uncode */
%let ch_len = 2;

data Heb_Table;
  length string_in string_out $130     /* Hebrew 2 bytes/char include spaces */
         aleph $2;
  retain aleph "&kbd_aleph" delta;
  if _N_=1 then do;
     delta = rank(substr(aleph,1,1))*256 + rank(substr(aleph,2,1)) - &uni_aleph;
     put DELTA=;
  infile datalines truncover dlm='09'x; /* use TAB as delimter betweem varname and string */
  input varname $ string_in :$char120.;
  len = length(string_in);
  string_out = '';
  do until(i ge length(string_in));
     if substr(string_in,i,1) = substr(aleph,1,1) 
        then by=2;
        else by=1;
     char = substr(string_in,i,by);
     if by=2 then do;
        cn = rank(substr(char,1,1))*256 + rank(substr(char,2,1)) - delta; /* &uni_aleph; */
        ch = put(byte(int(cn/256)) || byte(int(cn-int(cn/256)*256)), $hex4.);
     else ch = put(strip(char), $hex2.);
     string_out = cats(string_out,ch);
     i = i + by;
  call symput(varname,strip(string_out));
  keep varname string_out;
datalines;     /*** use TAB between VARNAME and the string to translate ***/
ahuzim	אחוזים
std1	+ 2 סטיות תקן
std2	+ סטית תקן
std3	- סטית תקן
std4	- 2 סטיות תקן
avg		ממוצע
mikra	מקרא
month	חודש
total	סה""כ
%put ahuzim = &ahuzim;
%put std1 = &std1;

METHOD 4 - I intended to create a macro for translation and shucked with the next error:


ERROR: Maximum level of nesting of macro functions exceeded.


@JonathanNitzan , It was an enjoying chalenge for me.

We both learned much of it.

Regards, Shmuel



Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.


Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 38 replies
  • 1 like
  • 4 in conversation