BookmarkSubscribeRSS Feed
nour_anwar
Fluorite | Level 6

I want to keep only English characters ,hyphen "-" , and Arabic characters .this code worked fine for the  English characters and hyphen but it escape the Arabic characters  , how can I keep the Arabic characters  too !!

that's a screenshot of the code and the output log

Screenshot (467).png

8 REPLIES 8
BrunoMueller
SAS Super FREQ

As I can not enter Arabic letters on my swiss german keyboard, I went to use unicode symbols. The program below was run in a UTF-8 SAS session.

Please note the use of KCOMPRESS to deal with multibyte character sets, COMPRESS can only handle single byte character sets. See here for more details

https://go.documentation.sas.com/?docsetId=nlsref&docsetTarget=n01wgwo05gbv68n1w1u68i89pbtw.htm&docs...

 

I but together a string with all the chars I want to keep and then use the "KI" modifiers.

 

Give it a try

data test;
  length someText chars2keep $ 4096;
  chars2keep="- abcdefghijklmnopqrstuvwxyz";

  do arabicUnicode=0621x to 064Ax;
    arabic=unicode(cats("\u", put(arabicUnicode, hex4.)));
    chars2keep=cats(chars2keep, arabic);
  end;
  putlog chars2keep=;
  someText="*öäü*ab-cd_123_رياضة أنا أحب رياضتي وأنا سعيد حقا هنا لها حبي*";
  someText2=kcompress(someText, chars2keep, "ki");
  putlog (_all_) (=/);
run;

For a next time please provide text, so that one can easily copy paste

 

nour_anwar
Fluorite | Level 6

@BrunoMueller  thanks alot for help !!

i tried you code but I got this error: ERROR 72-185: The KCOMPRESS function call has too many arguments.

BrunoMueller
SAS Super FREQ

this has to do with the SAS Version, see here for more details: https://support.sas.com/kb/63/402.html

You do need SAS9.4M6 for it to work.

 

To better understand how you run SAS, can you provide the SAS log of the code below:

%put NOTE: &=sysvlong &=sysscpl;
proc options group=languagecontrol;
run;

If your SAS session is using a single byte encoding, you should be able to just use the COMPRESS function

nour_anwar
Fluorite | Level 6

@BrunoMueller 

 

209 %put NOTE: &=sysvlong &=sysscpl;
NOTE: SYSVLONG=9.04.01M3P062415 SYSSCPL=X64_8HOME
210 proc options group=languagecontrol;
211 run;

 

SAS (r) Proprietary Software Release 9.4 TS1M3


Group=LANGUAGECONTROL
DATESTYLE=DMY Specifies the sequence of month, day, and year when ANYDTDTE, ANYDTDTM, or
ANYDTTME informat data is ambiguous.
DFLANG=ENGLISH Specifies the language for international date informats and formats.
EXTENDOBSCOUNTER=YES
Specifies whether to extend the maximum number of observations in a new SAS
data file.
LOCALEDATA=SASLOCALE
Specifies the location of the locale database.
NOLOGLANGCHG Disables changing the language of the SAS output when the LOCALE= option is
changed.
NOLOGLANGENG Write SAS log messages based on the values of the LOGLANGCHG, LSWLANG=, and
LOCALE= options when SAS started.
LSWLANG=LOCALE Specifies the language for SAS log and ODS messages when the LOCALE= option is
set after SAS starts.
MAPEBCDICTOASCII= Specifies the transcoding table that is used to convert characters from ASCII
to EBCDIC and EBCDIC to ASCII.
NONLDECSEPARATOR Disables formatting of numeric output using the decimal separator for the
locale.
NOODSLANGCHG Disables changing the language of the SAS message text in ODS output when the
LOCALE option is set after start up.
PAPERSIZE=LETTER Specifies the paper size to use for printing.
RSASIOTRANSERROR Displays a transcoding error when illegal values are read from a remote
application.
TIMEZONE= Specifies a time zone.
TRANTAB=(lat1lat1,lat1lat1,wlt1_ucs,wlt1_lcs,wlt1_ccl,,,)
Specifies the translation table catalog entries.
URLENCODING=SESSION
Specifies whether the argument to the URLENCODE function and to the URLDECODE
function is interpreted using the SAS session encoding or UTF-8 encoding.
NODBCS Disables double-byte character sets.
DBCSLANG=NONE Specifies a double-byte character set language.
DBCSTYPE=NONE Specifies the encoding method to use for a double-byte character set.
ENCODING=WLATIN1 Specifies the default character-set encoding for the SAS session.
LOCALE=AR_EG Specifies a set of attributes in a SAS session that reflect the language,
local conventions, and culture for a geographical region.
NONLSCOMPATMODE Encodes data using the SAS session encoding.
NOTE: PROCEDURE OPTIONS used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds

BrunoMueller
SAS Super FREQ

You are running SAS in the wlatin1 encoding.

 

Try this SAS note https://support.sas.com/kb/18/577.html to configure your SAS to deal with arabic letters. Set the LOCALE according to your needs.

 

Since this is using a single byte character set you can use the COMPRESS function. Instead of the loop in my sample program to add arabic letters you can simply type them in.

 

Also see here for supported LOCALE and ENCODING https://go.documentation.sas.com/?docsetId=nlsref&docsetTarget=p0yctvkmmbdwl2n1k87qd1nv1nu3.htm&docs...

Patrick
Opal | Level 21

If you can't use the 3rd argument of the kcompress() function then I guess you need to do something "ugly" along the line of below.

data validChars;
  input validChars :$4.;
  output;
  if validChars ne upcase(validChars) then
    do;
      validChars = upcase(validChars);
      output;
    end;
  datalines;
a
b
d
e
_
;

data have;
  infile datalines truncover;
  input have_str $20.;
  datalines;
abc def ghi
acccb ss_s ghbik
abdd ee ffee
;


data want(drop=validChars _:);
  if _n_=1 then
    do;
      if 0 then set validChars;
      dcl hash h1(dataset:'validChars');
      h1.defineKey('validChars');
      h1.defineDone();
    end;
  
  set have;
  want_str=have_str;
  want_str=kcompress(want_str);
  _l=klength(want_str);
  length _char $4;
  do _i=1 to _l;
    _char=KSUBSTR(want_str, _i,1);
    if _char ne ' ' and h1.check(key:_char) ne 0 then 
      want_str=KTRANSLATE(want_str,' ',_char);
  end;
  want_str=kcompress(want_str);
run;

proc print;
run;

Patrick_0-1593584388923.png

 

ChrisNZ
Tourmaline | Level 20

> The KCOMPRESS function call has too many arguments.

You can still use function kcompress() with 2 arguments; just list all the unwanted characters as the second argument.

A bit more tedious than listing the characters you want to keep, but still valid.

andreas_lds
Jade | Level 19

Seems to be the problem that has been discussed in https://communities.sas.com/t5/SAS-Programming/RegEx-and-Arabic-letters/m-p/664689 last week.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 2304 views
  • 4 likes
  • 5 in conversation