Hi, all
I need to use something like the "compged" or "complev" function to compare two text strings, but I need to process UTF-8 data containing weird and wild characters. The SAS NLS guides say that these two functions aren't certified for multi-byte character data.
Does anybody have any suggestions for how I can do this?
Much thanks,
Tom
I guess you'd have to compute the distance yourself using the k* functions.
1. That's good ballot entry
2. Choices must be made as that's not a straight forward computation. What is the distance between 'hä' and 'hà' ?
3. Syllabic scripts or ideograms would provide nice head-scratchers (though there may already be algorithms for these).
Even comparisons of alphabetic scripts like Arabic would not be easy as the character changes depending on the position.
ـب ـبـ بـ ب are all letter B.
@TomKari: Do a google search for: generalized edit distance utf-8 r
There are a number of r packages available.
Art, CEO, AnalystFinder.com
Look as the BASECHAR function in NLS. Without the second argument, it returns an ASCII version of your string. At least, that's what the documentation example suggests.
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Still thinking about your presentation idea? The submission deadline has been extended to Friday, Nov. 14, at 11:59 p.m. ET.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.