Hi Eyal:
I ran your code (after retyping the Hebrew letters) and this is the log I get:
2034 data _null_;
2035 win1255name = "??? ???";
2036 put win1255name $hex20.;
2037 /* convert to Hebrew DOS */
2038 pcoemname = kcvt(win1255name,"pcoem862");
2039 put pcoemname $hex20.;
2040 /* convert to UTF8 */
2041 utfname = kcvt(win1255name,"utf8");
2042 put utfname $hex20.;
2043 /* convert to Unicode NCR */
2044 utf8ncr = unicodec(win1255name,"NCR");
2045 put utf8ncr ;
2046 /* convert to Unicode ESC */
2047 utf8esc = unicodec(win1255name,"ESC");
2048 put utf8esc ;
2049 /* convert back to Hebrew using unicode or kcvt */
2050 win1255name2 = unicode(utfname,"utf8");
2051 put win1255name2 $hex20.;
2052 run;
3F3F3F203F3F3F
3F3F3F203F3F3F202020
3F3F3F203F3F3F202020
??? ???
??? ???
3F3F3F203F3F3F202020
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
1. The code I posted assumes your SAS session is invoked with encoding=Hebrew (on Linux/Unix) or encoding=Whebrew (on Windows). You can verify by running "proc options option=encoding;run;" in your SAS session.
2. The code I posted shows you the various SAS functions you can use to convert non-Unicode text to Unicode (UTF8). To run the code make sure you place a valid Hebrew text into the field "win1255name. In the log you sent there are question marks in this field's value instead of Hebrew characters...Correct the question marks and rerun. I hope one of the functions in the code is the one you are looking for.
Eyal
1. I have sorted the encoding issue (with your help and SAS' customer support). The problem was that my SAS session was invoked with the Target:
"C:\Program Files\SASHome\SASFoundation\9.4\sas.exe" -CONFIG "C:\Program Files\SASHome\SASFoundation\9.4\nls\en\sasv9.cfg"
I changed the "en" to "u8", and I now have the needed "ENCODING=UTF-8".
2. As I indicated, there are so many postings now, that I find them hard to sort out. So let's start with your suggestion, Eyal:
data _null_;
win1255name = "אבג דהו";
put win1255name $hex20.;
/* convert to Hebrew DOS */
pcoemname = kcvt(win1255name,"pcoem862");
put pcoemname $hex20.;
/* convert to UTF8 */
utfname = kcvt(win1255name,"utf8");
put utfname $hex20.;
/* convert to Unicode NCR */
utf8ncr = unicodec(win1255name,"NCR");
put utf8ncr ;
/* convert to Unicode ESC */
utf8esc = unicodec(win1255name,"ESC");
put utf8esc ;
/* convert back to Hebrew using unicode or kcvt */
win1255name2 = unicode(utfname,"utf8");
put win1255name2 $hex20.;
run;
Having run this code, I get the following Log.
5
6 data _null_;
7 win1255name = "×בג דהו";
8 put win1255name $hex20.;
9 /* convert to Hebrew DOS */
10 pcoemname = kcvt(win1255name,"pcoem862");
11 put pcoemname $hex20.;
12 /* convert to UTF8 */
13 utfname = kcvt(win1255name,"utf8");
14 put utfname $hex20.;
15 /* convert to Unicode NCR */
16 utf8ncr = unicodec(win1255name,"NCR");
17 put utf8ncr ;
18 /* convert to Unicode ESC */
19 utf8esc = unicodec(win1255name,"ESC");
20 put utf8esc ;
21 /* convert back to Hebrew using unicode or kcvt */
22 win1255name2 = unicode(utfname,"utf8");
23 put win1255name2 $hex20.;
24 run;
D790D791D79220D793D7
80818220838485202020
D790D791D79220D793D7
אבג דהו
\u05D0\u05D1\u05D2 \u05D3\u05D4\u05D5
D790D791D79220D793D7
Of the 6 different options, the (almost) correct function is:
utf8esc = unicodec(win1255name,"ESC");
Which produces:
\u05D0\u05D1\u05D2 \u05D3\u05D4\u05D5
However, this function adds (1) "\" and "u" in front of the unicodes, additions that my %label statement doesn't understand; and (2) an actual space between the two Hebrew strings instead of the unicode "0020" for a space.
3. Is there a way to "clean" this resulting string so it produces only the only the necessary unicodes? I won't be surprised if Shmuel had already solved it in his posts, but I'd appreciate having a fresh summary instead of having to sort it out myself....
Thanks,
Jonathan
Eyal: This is the best "under your nose" line I have read for a while! It works like charm, as it should.
Shmuel: Thank you for all the work, which is truly appreciated, regardless.
Jonathan
@JonathanNitzan I'm curious to know weather, now that you run sas with encoding=UTF-8, you still need the translation from UTF-8 to UNICODE? Maybe you can just write the Hebrew string as is:
%label(2,98,"אחוזים"x,black,0,0,3.2,'Arial/Unicode/italic’,6);
In case, translation is still needed, you can even use the uncodec() function in a macro as in next log:
73 %macro uni(str); 74 %local x; 75 %let x=%sysfunc(unicodec(&str)); 76 %sysfunc(compress(&x,\u)); 77 %mend; 78 data _null_; 79 text= "*** Testing - %uni(אחוזים) ***"; 80 put text=; 81 run; text=*** Testing - 05D005D705D505D605D905DD; ***
@EyalGonen , Thanks, You were very helpful.
@JonathanNitzan , You are touching the end.
I am going to summarize that thread.
1) The thread deals with two different issues:
1.1 Encoding issue - required to change the ENCODING from WLATIN1 to UTF-8.
You solved it by changing the SAS invocation command, from:
"C:\Program Files\SASHome\SASFoundation\9.4\sas.exe" -CONFIG "C:\Program Files\SASHome\SASFoundation\9.4\nls\en\sasv9.cfg"
by replacing en by u8.
1.2 Converting Hebrew strings from UTF-8 to Hebrew UNICODE to be used with SAS Graphs.
I proposed three different codes to it.
I shall present them here again in my proposed priority:
METHOD 1 - Recomended
data test;
length strin: $80;
infile datalines truncover dlm='09'x; /* use TAB as delimter betweem varname and string */
input varname $ string_in :$char80.;
string_out = compress(unicodec(string_in), '\u');
call symput(varname,strip(string_out));
keep varname string_out;
datalines; /*** use TAB between VARNAME and the string to translate ***/
ahuzim אחוזים
std1 + 2 סטיות תקן
std2 + סטית תקן
std3 - סטית תקן
std4 - 2 סטיות תקן
avg ממוצע
mikra מקרא
month חודש
total סה""כ
;
run;
%put Ahuzim = &ahuzim;
%put STD1 = &std1;
%put MIKRA = &mikra;
%put TOTAL = &total;
METHOD 2 - using format for translating:
data heb2uni;
retain fmtname '$h2u';
length heb $2 uni $4;
infile cards;
input uni $ heb $;
cards;
05D0 א
05D1 ב
05D2 ג
05D3 ד
05D4 ה
05D5 ו
05D6 ז
05D7 ח
05D8 ט
05D9 י
05DA ך
05DB כ
05DC ל
05DD ם
05DE מ
05DF ן
05E0 נ
05E1 ס
05E2 ע
05E3 ף
05E4 פ
05E5 ץ
05E6 צ
05E7 ק
05E8 ר
05E9 ש
05EA ת
; run;
proc format lib=work
cntlin=heb2uni(rename=(heb=start uni=label));
run;
data HebTable;
length hebstr unistr $80 ch $4;
infile cards dlm='09'x truncover;
input varname $ hebstr $;
unistr = ''; hebstr = strip(hebstr);
do i=1 to length(hebstr) by 2;
ch = put(substr(hebstr,i,2), $h2u.);
unistr = cats(unistr,ch);
end;
keep varname hebstr unistr;
putlog varname= hebstr= unistr=;
call symput(varname,strip(unistr));
cards;
ahuzim אחוזים
std1 + 2 סטיות תקן
std2 + סטית תקן
std3 - סטית תקן
std4 - 2 סטיות תקן
avg ממוצע
mikra מקרא
month חודש
total סה""כ
;
run;
%put AHUZIM = &ahuzim;
%put STD1 = &std1;
%put AVG = &avg;
%put TOTAL = &total;
/*** usage example ***/
data _null_;
txt1 = "(xAHUZIM= &ahuzim xxx)";
put txt1=;
txt2 = "(xTOTAL= &total xxx)";
put txt2=;
run;
METHOD 3 - Doing the hard work:
/* tarnslate hebrew text to UNICODE */
/* SEE ALSO: targilim/Heb2Unicodec.sas Heb2Uni.sas as alternatives */
%let kbd_aleph = א; /* RETYPE with local keyboard */
%let uni_aleph = 1488; /* ALEF uncode */
%let ch_len = 2;
data Heb_Table;
length string_in string_out $130 /* Hebrew 2 bytes/char include spaces */
aleph $2;
retain aleph "&kbd_aleph" delta;
if _N_=1 then do;
delta = rank(substr(aleph,1,1))*256 + rank(substr(aleph,2,1)) - &uni_aleph;
put DELTA=;
end;
infile datalines truncover dlm='09'x; /* use TAB as delimter betweem varname and string */
input varname $ string_in :$char120.;
len = length(string_in);
string_out = '';
i=1;
do until(i ge length(string_in));
if substr(string_in,i,1) = substr(aleph,1,1)
then by=2;
else by=1;
char = substr(string_in,i,by);
if by=2 then do;
cn = rank(substr(char,1,1))*256 + rank(substr(char,2,1)) - delta; /* &uni_aleph; */
ch = put(byte(int(cn/256)) || byte(int(cn-int(cn/256)*256)), $hex4.);
end;
else ch = put(strip(char), $hex2.);
string_out = cats(string_out,ch);
i = i + by;
end;
call symput(varname,strip(string_out));
keep varname string_out;
datalines; /*** use TAB between VARNAME and the string to translate ***/
ahuzim אחוזים
std1 + 2 סטיות תקן
std2 + סטית תקן
std3 - סטית תקן
std4 - 2 סטיות תקן
avg ממוצע
mikra מקרא
month חודש
total סה""כ
;
run;
%put ahuzim = &ahuzim;
%put std1 = &std1;
METHOD 4 - I intended to create a macro for translation and shucked with the next error:
ERROR: Maximum level of nesting of macro functions exceeded.
@JonathanNitzan , It was an enjoying chalenge for me.
We both learned much of it.
Regards, Shmuel
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.