Greetings:
(I'm using SAS 94 TS Level 1M3 on X64_8Home.)
I’m trying to include Hebrew text in an Annotate data set called ‘panel1’ below:
***
data panel1;
length function color $ 8 STYLE $ 18 text $ 100;
hsys='1'; xsys='1'; ysys='1';
label(2,98,'05d005d705d505d605d905dd'x,black,0,0,3.2,'Arial/Unicode/italic’,6);
run;
***
When included in Proc GPLOT the label statement will generate the following italicized Hebrew string:
אחוזים
The problem is that, when I have a lot of text, this method is very labour intensive. Is there a simple routine I can use in which I type the Hebrew string in the program and the routine converts it to Unicode?
Thank you.
I have done some work about it in the past. The attached program may give you a hint.
Do you need translation of a short strings (messages, labels etc.) and / or a mass of text given in a .txt file. Let me some more details of your needs and I will be happy to help.
data heb_encode(keep= ot str);
retain i 1;
heb = 'אבגדהוזחטיכךלמםנןסעפףצץקרשת';
infile cards dlm='!';
input str $6.;
ot = substr(heb,i,2);
i+2;
output;
return;
cards;
=D7=90
=D7=91
=D7=92
=D7=93
=D7=94
=D7=95
=D7=96
=D7=97
=D7=98
=D7=99
=D7=9B
=D7=9A
=D7=9C
=D7=9E
=D7=9D
=D7=A0
=D7=9F
=D7=A1
=D7=A2
=D7=A4
=D7=A3
=D7=A6
=D7=A5
=D7=A7
=D7=A8
=D7=A9
=D7=AA
; run;
data cntl;
set heb_encode;
retain fmtname '$hebcvf';
rename ot = start str=label;
run;
proc format lib=mydata cntlin=cntl; run;
data cntl;
set heb_encode;
retain fmtname '$cvfheb';
rename str = start ot=label;
run;
proc format lib=mydata cntlin=cntl; run;
I believe I can adapt above code to most of your needs.
Thank you Shmuel for the very quick reply and offer of help. To your question:
1. I need to convert short text lines only (to be used as series names, short descriptions of chart items, etc.).
2. If possible, I would like to type the Hebrew text strings in the program itself and have your routine translate them to Unicode to be included in the Annotate data set.
Jonathan
I have started to develop translation code but need some more information:
1) Do I understand correctly - you need some macro code to make the translation inside
sas statements, which means that running a data step will be not available?
For example:
title "... %heb2unicode(אחוזים) ... ";
2) You supplied next line in your first post:
label(2,98,'05d005d705d505d605d905dd'x,black,0,0,3.2,'Arial/Unicode/italic’,6);
please check:
- would it work correctly with double quotes instead single quotes ?
- would it work with code like:
%let ahuzim = 05d005d705d505d605d905dd;
label(2,98,"&ahuzim"x,black,0,0,3.2,'Arial/Unicode/italic’,6);
If positive, my idea is to create a macro that generates macro variables with the unicode hex value you need.
3) Ascii code is one byte per character.
UTF8 Hebrew code is two-bytes per letter.
As much as I know 'א' is 'D790'x. I don't see such combination in you label line;
Can you check each encoded Hebrew letter what is its hex combination?
Try by translating - using your method - the next Hebrew string:
א ב ג ק ר ש ת
Thank you Shmuel. Here are my answers.
1. Yes, I'd like to input the Hebrew letters in the SAS code, similarly to the example you indicate. I'm not sure what you mean by 'running a data step will be not available'.
2. The label statement works correctly with double quotes (" "). Your second line of code returned the following three error messages:
ERROR: Undeclared array referenced: label.
ERROR 22-322: Syntax error, expecting one of the following: +, =.
ERROR 76-322: Syntax error, statement will be ignored.
3. For the Hebrew code I'm using https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=&ved=2ahUKEwjyluKc0dXtAhVNGlkFHWlvCx0Q...
I see that you type Hebrew letters.
Please run next code by retyping the Hebrew letters:
data test_heb;
length ch $2 cx $4;
ch = "א";
cx = put(ch,$hex4.);
put ch= cx=;
ch = "ת";
cx = put(ch,$hex4.);
put ch= cx=;
run;
and please post the log of that run;
That will let me know your environment Hebrew hex range.
I ran your code and this is the resulting log:
1131 data test_heb;
1132 length ch $2 cx $4;
1133 ch = "?";
1134 cx = put(ch,$hex4.);
1135 put ch= cx=;
1136
1137 ch = "?";
1138 cx = put(ch,$hex4.);
1139 put ch= cx=;
1140 run;
ch=? cx=3F20
ch=? cx=3F20
1) Technically it is not possible to run a new data step inside a current data step:
data step1;
set anydata;
.... sas statements ...
data _null_;
... any helpful code ...
run;
... more sas statements ...
run;
2) Your answer:
"The label statement works correctly with double quotes (" ").
Your second line of code returned the following three error messages:"
The reason to the error messages - You cannot enter %LET inside a PROC.
Next time, in case of error, it will be helpful to get the full step log, not just
the messages.
Please run the code as in next template:
%let ahuzim = ..... ;
proc gplot ... ;
... statements ...
label(2,98,"&ahuzim"x,black,0,0,3.2,'Arial/Unicode/italic’,6);
... more statements ...
run;
3) Excellent, now I have the encoding for translate output you need.
4) Did you re-typed the Hebrew letters using the keyboard or just copied the code
that I posted, just as is? Do you use Hebrew keyboard to type Hebrew letters?
It is impossible that different letters will show the same hexadecimal value
cx = 3f20
1) that's true for PROC too - you can't use data step inside a PROC step like:
PROG GPLOT;
...... data _NULL_; ... any helpful code ...; run; ..
run;
Shmuel:
1. For the two of us to be on the same page, I enclose here (1) the sas program file, (2) the sas dataset, and (3) the PDF version of the SVG output file (SVG format cannot be uploaded here).
2. Note that the enclosed program successfully uses your %let ahuzim macro.
3. Regarding the test_heb dataset below, I did type in the Hebrew letters as you asked, and the results are shown below.
Thank you,
Jonathan
data test_heb;
length ch $2 cx $4;
ch = "א";
cx = put(ch,$hex4.);
put ch= cx=;
ch = "ת";
cx = put(ch,$hex4.);
put ch= cx=;
run;
3735 data test_heb;
3736 length ch $2 cx $4;
3737 ch = "?";
3738 cx = put(ch,$hex4.);
3739 put ch= cx=;
3740
3741 ch = "?";
3742 cx = put(ch,$hex4.);
3743 put ch= cx=;
3744 run;
ch=? cx=3F20
ch=? cx=3F20
NOTE: The data set WORK.TEST_HEB has 1 observations and 2 variables.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds
I workd on two different methods.
I tried to develop macro programs but was stucked.
Relating to the log - I cannot understand why you got ch=? (instead ch=א / ch=ת);
I ran the same code and expected to see:
1 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK; 72 73 data test_heb; 74 length ch $2 cx $4; 75 ch = "א"; 76 cx = put(ch,$hex4.); 77 put ch= cx=; 78 79 ch = "ת"; 80 cx = put(ch,$hex4.); 81 put ch= cx=; 82 run; ch=א cx=D790 ch=ת cx=D7AA
Finally I developed next code which may solve your problem.
The program creates macro variables with names you assign,
so you cane use them again and again in any graph you generate.
You can %include the program to your autoexec.sas and all macro variables
will be available to you thru the sas session.
Other benefits are:
- easy to add strings to translate
- less coding in graph developing.
Please try next code and don't hesitate to post if any issue.
Issues that can be:
1) special characters that need be translated
2) order of substrings (such as: xxx 2 instead 2 xxx ?! )
/* tarnslate hebrew text to UNICODE */
%let kbd_aleph = א; /* RETYPE with local keyboard */
%let uni_aleph = 1488; /* ALEF uncode = '05D0'x */
%let ch_len = 2;
data Heb_Table;
length string_in string_out $130 /* Hebrew 2 bytes/char include spaces */
aleph $2;
retain aleph "&kbd_aleph" delta;
if _N_=1 then do;
delta = rank(substr(aleph,1,1))*256 + rank(substr(aleph,2,1)) - &uni_aleph;
put DELTA=;
end;
infile datalines truncover dlm='09'x; /* use TAB as delimter betweem varname and string */
input varname $ string_in :$char120.;
len = length(string_in);
string_out = '';
i=1;
do until(i ge length(string_in));
if substr(string_in,i,1) = substr(aleph,1,1)
then by=2;
else by=1;
char = substr(string_in,i,by);
if by=2 then do;
cn = rank(substr(char,1,1))*256 + rank(substr(char,2,1)) - delta; /* &uni_aleph; */
ch = put(byte(int(cn/256)) || byte(int(cn-int(cn/256)*256)), $hex4.);
end;
else ch = put(strip(char), $hex2.);
string_out = cats(string_out,ch);
i = i + by;
end;
call symput(varname,strip(string_out));
keep varname string_out;
datalines; /*** use TAB between VARNAME and the string to translate ***/
ahuzim אחוזים
std1 + 2 סטיות תקן
std2 + סטית תקן
std3 - סטית תקן
std4 - 2 סטיות תקן
avg ממוצע
mikra מקרא
month חודש
total סה"כ
;
run;
%put ahuzim = &ahuzim;
%put std1 = &std1;
created macro variables as displayed in the log:
121 run; 122 123 %put ahuzim = &ahuzim; ahuzim = 05D005D705D505D605D905DD 124 %put std1 = &std1; std1 = 2B3205E105D805D905D505EA05EA05E705DF 125
@EyalGonen , I have copied a part of @JonathanNitzan 's post, and it seems strange to me.
Why a typed character, printable in the program editor, was replaced by a question mark '?' when displayed in the log?
The program:
data test_heb;
length ch $2 cx $4;
ch = "א";
cx = put(ch,$hex4.);
put ch= cx=;
ch = "ת";
cx = put(ch,$hex4.);
put ch= cx=;
run;
The log:
3735 data test_heb;
3736 length ch $2 cx $4;
3737 ch = "?";
3738 cx = put(ch,$hex4.);
3739 put ch= cx=;
3740
3741 ch = "?";
3742 cx = put(ch,$hex4.);
3743 put ch= cx=;
3744 run;
ch=? cx=3F20
ch=? cx=3F20
NOTE: The data set WORK.TEST_HEB has 1 observations and 2 variables.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds
Is this a result of encoding issue? or is it a result of some other system option?
Check out the code below. You may also note there is a "SAS Users in Israel" community for Hebrew related questions.
data _null_;
win1255name = "אבג דהו";
put win1255name $hex20.;
/* convert to Hebrew DOS */
pcoemname = kcvt(win1255name,"pcoem862");
put pcoemname $hex20.;
/* convert to UTF8 */
utfname = kcvt(win1255name,"utf8");
put utfname $hex20.;
/* convert to Unicode NCR */
utf8ncr = unicodec(win1255name,"NCR");
put utf8ncr ;
/* convert to Unicode ESC */
utf8esc = unicodec(win1255name,"ESC");
put utf8esc ;
/* convert back to Hebrew using unicode or kcvt */
win1255name2 = unicode(utfname,"utf8");
put win1255name2 $hex20.;
run;
E0E1E220E3E4E5 <- Win 1255 Hebrew
80818220838485202020 <- DOS Hebrew
D790D791D79220D793D7 <- UTF8 Hebrew
E0E1E220E3E4E5202020 <- Win 1255 Hebrew
אבג דהו <- UTF8 NCR
\u05D0\u05D1\u05D2 \u05D3\u05D4\u05D5 <- UTF8 ESC
E0E1E220E3E4E5202020
@JonathanNitzan , In case a string in the form of ENCODEC function output, like
92 %put Ahuzim=&ahuzim; Ahuzim=\u05D0\u05D7\u05D5\u05D6\u05D9\u05DD
is acceptable by the %label macro you use, it simplifies the code and the maintenance is by SAS International co. Adapted code should be:
/*=====================================*/
/* Eyal Gonen - using encodec function */
data test;
length strin: $80;
infile datalines truncover dlm='09'x; /* use TAB as delimter betweem varname and string */
input varname $ string_in :$char80.;
string_out = unicodec(string_in);
call symput(varname,strip(string_out));
keep varname string_out;
datalines; /*** use TAB between VARNAME and the string to translate ***/
ahuzim אחוזים
std1 + 2 סטיות תקן
std2 + סטית תקן
std3 - סטית תקן
std4 - 2 סטיות תקן
avg ממוצע
mikra מקרא
month חודש
total סה"כ
;
run;
%put Ahuzim=&ahuzim;
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.