Hi Eyal:
I ran your code (after retyping the Hebrew letters) and this is the log I get:
2034 data _null_;
2035 win1255name = "??? ???";
2036 put win1255name $hex20.;
2037 /* convert to Hebrew DOS */
2038 pcoemname = kcvt(win1255name,"pcoem862");
2039 put pcoemname $hex20.;
2040 /* convert to UTF8 */
2041 utfname = kcvt(win1255name,"utf8");
2042 put utfname $hex20.;
2043 /* convert to Unicode NCR */
2044 utf8ncr = unicodec(win1255name,"NCR");
2045 put utf8ncr ;
2046 /* convert to Unicode ESC */
2047 utf8esc = unicodec(win1255name,"ESC");
2048 put utf8esc ;
2049 /* convert back to Hebrew using unicode or kcvt */
2050 win1255name2 = unicode(utfname,"utf8");
2051 put win1255name2 $hex20.;
2052 run;
3F3F3F203F3F3F
3F3F3F203F3F3F202020
3F3F3F203F3F3F202020
??? ???
??? ???
3F3F3F203F3F3F202020
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
1. The code I posted assumes your SAS session is invoked with encoding=Hebrew (on Linux/Unix) or encoding=Whebrew (on Windows). You can verify by running "proc options option=encoding;run;" in your SAS session.
2. The code I posted shows you the various SAS functions you can use to convert non-Unicode text to Unicode (UTF8). To run the code make sure you place a valid Hebrew text into the field "win1255name. In the log you sent there are question marks in this field's value instead of Hebrew characters...Correct the question marks and rerun. I hope one of the functions in the code is the one you are looking for.
Eyal
1. I have sorted the encoding issue (with your help and SAS' customer support). The problem was that my SAS session was invoked with the Target:
"C:\Program Files\SASHome\SASFoundation\9.4\sas.exe" -CONFIG "C:\Program Files\SASHome\SASFoundation\9.4\nls\en\sasv9.cfg"
I changed the "en" to "u8", and I now have the needed "ENCODING=UTF-8".
2. As I indicated, there are so many postings now, that I find them hard to sort out. So let's start with your suggestion, Eyal:
data _null_;
win1255name = "אבג דהו";
put win1255name $hex20.;
/* convert to Hebrew DOS */
pcoemname = kcvt(win1255name,"pcoem862");
put pcoemname $hex20.;
/* convert to UTF8 */
utfname = kcvt(win1255name,"utf8");
put utfname $hex20.;
/* convert to Unicode NCR */
utf8ncr = unicodec(win1255name,"NCR");
put utf8ncr ;
/* convert to Unicode ESC */
utf8esc = unicodec(win1255name,"ESC");
put utf8esc ;
/* convert back to Hebrew using unicode or kcvt */
win1255name2 = unicode(utfname,"utf8");
put win1255name2 $hex20.;
run;
Having run this code, I get the following Log.
5
6 data _null_;
7 win1255name = "×בג דהו";
8 put win1255name $hex20.;
9 /* convert to Hebrew DOS */
10 pcoemname = kcvt(win1255name,"pcoem862");
11 put pcoemname $hex20.;
12 /* convert to UTF8 */
13 utfname = kcvt(win1255name,"utf8");
14 put utfname $hex20.;
15 /* convert to Unicode NCR */
16 utf8ncr = unicodec(win1255name,"NCR");
17 put utf8ncr ;
18 /* convert to Unicode ESC */
19 utf8esc = unicodec(win1255name,"ESC");
20 put utf8esc ;
21 /* convert back to Hebrew using unicode or kcvt */
22 win1255name2 = unicode(utfname,"utf8");
23 put win1255name2 $hex20.;
24 run;
D790D791D79220D793D7
80818220838485202020
D790D791D79220D793D7
אבג דהו
\u05D0\u05D1\u05D2 \u05D3\u05D4\u05D5
D790D791D79220D793D7
Of the 6 different options, the (almost) correct function is:
utf8esc = unicodec(win1255name,"ESC");
Which produces:
\u05D0\u05D1\u05D2 \u05D3\u05D4\u05D5
However, this function adds (1) "\" and "u" in front of the unicodes, additions that my %label statement doesn't understand; and (2) an actual space between the two Hebrew strings instead of the unicode "0020" for a space.
3. Is there a way to "clean" this resulting string so it produces only the only the necessary unicodes? I won't be surprised if Shmuel had already solved it in his posts, but I'd appreciate having a fresh summary instead of having to sort it out myself....
Thanks,
Jonathan
Eyal: This is the best "under your nose" line I have read for a while! It works like charm, as it should.
Shmuel: Thank you for all the work, which is truly appreciated, regardless.
Jonathan
@JonathanNitzan I'm curious to know weather, now that you run sas with encoding=UTF-8, you still need the translation from UTF-8 to UNICODE? Maybe you can just write the Hebrew string as is:
%label(2,98,"אחוזים"x,black,0,0,3.2,'Arial/Unicode/italic’,6);
In case, translation is still needed, you can even use the uncodec() function in a macro as in next log:
73 %macro uni(str); 74 %local x; 75 %let x=%sysfunc(unicodec(&str)); 76 %sysfunc(compress(&x,\u)); 77 %mend; 78 data _null_; 79 text= "*** Testing - %uni(אחוזים) ***"; 80 put text=; 81 run; text=*** Testing - 05D005D705D505D605D905DD; ***
@EyalGonen , Thanks, You were very helpful.
@JonathanNitzan , You are touching the end.
I am going to summarize that thread.
1) The thread deals with two different issues:
1.1 Encoding issue - required to change the ENCODING from WLATIN1 to UTF-8.
You solved it by changing the SAS invocation command, from:
"C:\Program Files\SASHome\SASFoundation\9.4\sas.exe" -CONFIG "C:\Program Files\SASHome\SASFoundation\9.4\nls\en\sasv9.cfg"
by replacing en by u8.
1.2 Converting Hebrew strings from UTF-8 to Hebrew UNICODE to be used with SAS Graphs.
I proposed three different codes to it.
I shall present them here again in my proposed priority:
METHOD 1 - Recomended
data test;
length strin: $80;
infile datalines truncover dlm='09'x; /* use TAB as delimter betweem varname and string */
input varname $ string_in :$char80.;
string_out = compress(unicodec(string_in), '\u');
call symput(varname,strip(string_out));
keep varname string_out;
datalines; /*** use TAB between VARNAME and the string to translate ***/
ahuzim אחוזים
std1 + 2 סטיות תקן
std2 + סטית תקן
std3 - סטית תקן
std4 - 2 סטיות תקן
avg ממוצע
mikra מקרא
month חודש
total סה""כ
;
run;
%put Ahuzim = &ahuzim;
%put STD1 = &std1;
%put MIKRA = &mikra;
%put TOTAL = &total;
METHOD 2 - using format for translating:
data heb2uni;
retain fmtname '$h2u';
length heb $2 uni $4;
infile cards;
input uni $ heb $;
cards;
05D0 א
05D1 ב
05D2 ג
05D3 ד
05D4 ה
05D5 ו
05D6 ז
05D7 ח
05D8 ט
05D9 י
05DA ך
05DB כ
05DC ל
05DD ם
05DE מ
05DF ן
05E0 נ
05E1 ס
05E2 ע
05E3 ף
05E4 פ
05E5 ץ
05E6 צ
05E7 ק
05E8 ר
05E9 ש
05EA ת
; run;
proc format lib=work
cntlin=heb2uni(rename=(heb=start uni=label));
run;
data HebTable;
length hebstr unistr $80 ch $4;
infile cards dlm='09'x truncover;
input varname $ hebstr $;
unistr = ''; hebstr = strip(hebstr);
do i=1 to length(hebstr) by 2;
ch = put(substr(hebstr,i,2), $h2u.);
unistr = cats(unistr,ch);
end;
keep varname hebstr unistr;
putlog varname= hebstr= unistr=;
call symput(varname,strip(unistr));
cards;
ahuzim אחוזים
std1 + 2 סטיות תקן
std2 + סטית תקן
std3 - סטית תקן
std4 - 2 סטיות תקן
avg ממוצע
mikra מקרא
month חודש
total סה""כ
;
run;
%put AHUZIM = &ahuzim;
%put STD1 = &std1;
%put AVG = &avg;
%put TOTAL = &total;
/*** usage example ***/
data _null_;
txt1 = "(xAHUZIM= &ahuzim xxx)";
put txt1=;
txt2 = "(xTOTAL= &total xxx)";
put txt2=;
run;
METHOD 3 - Doing the hard work:
/* tarnslate hebrew text to UNICODE */
/* SEE ALSO: targilim/Heb2Unicodec.sas Heb2Uni.sas as alternatives */
%let kbd_aleph = א; /* RETYPE with local keyboard */
%let uni_aleph = 1488; /* ALEF uncode */
%let ch_len = 2;
data Heb_Table;
length string_in string_out $130 /* Hebrew 2 bytes/char include spaces */
aleph $2;
retain aleph "&kbd_aleph" delta;
if _N_=1 then do;
delta = rank(substr(aleph,1,1))*256 + rank(substr(aleph,2,1)) - &uni_aleph;
put DELTA=;
end;
infile datalines truncover dlm='09'x; /* use TAB as delimter betweem varname and string */
input varname $ string_in :$char120.;
len = length(string_in);
string_out = '';
i=1;
do until(i ge length(string_in));
if substr(string_in,i,1) = substr(aleph,1,1)
then by=2;
else by=1;
char = substr(string_in,i,by);
if by=2 then do;
cn = rank(substr(char,1,1))*256 + rank(substr(char,2,1)) - delta; /* &uni_aleph; */
ch = put(byte(int(cn/256)) || byte(int(cn-int(cn/256)*256)), $hex4.);
end;
else ch = put(strip(char), $hex2.);
string_out = cats(string_out,ch);
i = i + by;
end;
call symput(varname,strip(string_out));
keep varname string_out;
datalines; /*** use TAB between VARNAME and the string to translate ***/
ahuzim אחוזים
std1 + 2 סטיות תקן
std2 + סטית תקן
std3 - סטית תקן
std4 - 2 סטיות תקן
avg ממוצע
mikra מקרא
month חודש
total סה""כ
;
run;
%put ahuzim = &ahuzim;
%put std1 = &std1;
METHOD 4 - I intended to create a macro for translation and shucked with the next error:
ERROR: Maximum level of nesting of macro functions exceeded.
@JonathanNitzan , It was an enjoying chalenge for me.
We both learned much of it.
Regards, Shmuel
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.