DATA Step, Macro, Functions and more

NonEnglish Character

Reply
Super Contributor
Posts: 717

NonEnglish Character

How to set a flag if a string has nonEnglish character?

PROC Star
Posts: 1,400

Re: NonEnglish Character

Is a non-english character a character not in a-z?

Super Contributor
Posts: 717

Re: NonEnglish Character

Yes. It has some chinese charcters

PROC Star
Posts: 1,400

Re: NonEnglish Character

Do something like this

 

data have;
string="abc";output;
string="ab人物";output;
string="xyz";output;
run;

data want;
   set have;
   flag=ifn(notalpha(string),1,0);
run;
Super Contributor
Posts: 717

Re: NonEnglish Character

flag is set to 1 for all values

PROC Star
Posts: 1,400

Re: NonEnglish Character

[ Edited ]

Ok. Does this work for you?

 

data have;
string="abc";output;
string="ab人物";output;
string="xyz";output;
run;

data want;
   set have;
   flag=ifn(lengthn(compress(string, "abcdefghijklmnopqrstuvwxyz", "i")),1,0);
run;

 EDIT: I added an IFN Function to the code.

Super Contributor
Posts: 717

Re: NonEnglish Character

I cannot extend it to below, it fails:

 

 flag=lengthn(catx("-",string1, string2),"abcdefghijklmnopqrstuvwxyz", "i");

Super User
Posts: 2,061

Re: NonEnglish Character

@draycut , The code of yours can be tweaked to

 

data have;
string="abc";output;
string="ab人物";output;
string="xyz";output;
run;

data want;
   set have;
   flag=lengthn(compress(string, " ", "ai"))>0;
/*   flag=ifn(lengthn(compress(string, "abcdefghijklmnopqrstuvwxyz", "i")),1,0);*/
run;
PROC Star
Posts: 1,400

Re: NonEnglish Character

This solution however, relies on your OPTIONS LOCALE= System Option

 

Let me know if it does not meet your needs.

Contributor
Posts: 36

Re: NonEnglish Character

I think regular expression would be the easiest way to do it. Here is how I would do this:

 

data want;
  length mytext $100.;
  input mytext $;
  flag = ifn(prxmatch('/[^a-zA-Z0-9 ]/', mytext) > 0, 1, 0);
datalines;
Hello
Sébastien
;
run;

Respected Advisor
Posts: 4,795

Re: NonEnglish Character

@ybolduc

You need to be careful which string function you're using as soon as it comes to dealing with multi byte characters. 

The PRX...() functions are unfortunately only good for single byte.

http://support.sas.com/documentation/cdl//en/nlsref/69741/HTML/default/viewer.htm#p1pca7vwjjwucin178...

Esteemed Advisor
Posts: 5,624

Re: NonEnglish Character

Try this if you have National Language Support

 

data want;
   set have;
   flag= string ne basechar(string);
run;

Try it on a real sample of your data, not datalines text.

 

PG
Super User
Posts: 2,512

Re: NonEnglish Character

This

 

if length(CHAR) ne klength(CHAR);

will detect any character that doesn't use single-byte encoding.

Ask a Question
Discussion stats
  • 12 replies
  • 129 views
  • 2 likes
  • 7 in conversation