BookmarkSubscribeRSS Feed
TomKari
Onyx | Level 15

Hi, all

 

I need to use something like the "compged" or "complev" function to compare two text strings, but I need to process UTF-8 data containing weird and wild characters. The SAS NLS guides say that these two functions aren't certified for multi-byte character data.

 

Does anybody have any suggestions for how I can do this?

 

Much thanks,
Tom

3 REPLIES 3
ChrisNZ
Tourmaline | Level 20

I guess you'd have to compute the distance yourself using the k* functions.

 

1. That's good ballot entry

 

2. Choices must be made as that's not a straight forward computation. What is the distance between  'hä' and 'hà'  ?

 

3. Syllabic scripts or ideograms would provide nice head-scratchers (though there may already be algorithms for these).

Even comparisons of alphabetic scripts like Arabic would not be easy as the character changes depending on the position.

     ـب     ـبـ       بـ     ب    are all letter B.

 

 

art297
Opal | Level 21

@TomKari: Do a google search for: generalized edit distance utf-8 r

 

There are a number of r packages available.

 

Art, CEO, AnalystFinder.com

 

PGStats
Opal | Level 21

Look as the BASECHAR function in NLS. Without the second argument, it returns an ASCII version of your string. At least, that's what the documentation example suggests.

PG

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 3 replies
  • 1667 views
  • 5 likes
  • 4 in conversation