Hello,
I am trying to use the soundex function on SAS Enterprise however the column is in Arabic Language and not in English.
Is there a way to handle this please?
Not easily, no.
The documentation for the SOUNDEX function describes how the function works. The doc says, that the algorithm (from the 1910'1 and '20s!) "is English-biased and is less useful for languages other than English." The doc also describes exactly how it works. It basically encodes every word and then declares that words that have the same encoding "sound alike."
To work with a non-English language, you would have to determine a similar encoding that encodes the sounds (based on characters) in the desired language. You would then write an FCMP function that implements the algorithm and can be used in the DATA step. Theoretically, this is possible for languages that use single-byte characters (such as Spanish and French), but it becomes much more difficult for multiple-byte character sets such as Arabic, Korean, or Chinese. In practice, I suspect this would be extremely difficult.
What is the actual business problem you are trying to solve by using SOUNDEX? If you explain that then maybe there is another way that might work.
If there are only 420 unique business descriptions, then manually building a lookup table to group them shouldn't take too long. I would put these in a spreadsheet and then type in the groups. Once complete, just import the the spreadsheet back into a SAS dataset and join it back to your original data.
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Check out this tutorial series to learn how to build your own steps in SAS Studio.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.