BookmarkSubscribeRSS Feed
TomKari
Onyx | Level 15

Hi, all

 

I need to use something like the "compged" or "complev" function to compare two text strings, but I need to process UTF-8 data containing weird and wild characters. The SAS NLS guides say that these two functions aren't certified for multi-byte character data.

 

Does anybody have any suggestions for how I can do this?

 

Much thanks,
Tom

3 REPLIES 3
ChrisNZ
Tourmaline | Level 20

I guess you'd have to compute the distance yourself using the k* functions.

 

1. That's good ballot entry

 

2. Choices must be made as that's not a straight forward computation. What is the distance between  'hä' and 'hà'  ?

 

3. Syllabic scripts or ideograms would provide nice head-scratchers (though there may already be algorithms for these).

Even comparisons of alphabetic scripts like Arabic would not be easy as the character changes depending on the position.

     ـب     ـبـ       بـ     ب    are all letter B.

 

 

art297
Opal | Level 21

@TomKari: Do a google search for: generalized edit distance utf-8 r

 

There are a number of r packages available.

 

Art, CEO, AnalystFinder.com

 

PGStats
Opal | Level 21

Look as the BASECHAR function in NLS. Without the second argument, it returns an ASCII version of your string. At least, that's what the documentation example suggests.

PG

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 950 views
  • 5 likes
  • 4 in conversation