Hi, all
I need to use something like the "compged" or "complev" function to compare two text strings, but I need to process UTF-8 data containing weird and wild characters. The SAS NLS guides say that these two functions aren't certified for multi-byte character data.
Does anybody have any suggestions for how I can do this?
Much thanks,
Tom
I guess you'd have to compute the distance yourself using the k* functions.
1. That's good ballot entry
2. Choices must be made as that's not a straight forward computation. What is the distance between 'hä' and 'hà'
?
3. Syllabic scripts or ideograms would provide nice head-scratchers (though there may already be algorithms for these).
Even comparisons of alphabetic scripts like Arabic would not be easy as the character changes depending on the position.
ـب ـبـ بـ ب are all letter B.
@TomKari: Do a google search for: generalized edit distance utf-8 r
There are a number of r packages available.
Art, CEO, AnalystFinder.com
Look as the BASECHAR function in NLS. Without the second argument, it returns an ASCII version of your string. At least, that's what the documentation example suggests.
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.