Hello,
I am trying to use the soundex function on SAS Enterprise however the column is in Arabic Language and not in English.
Is there a way to handle this please?
Not easily, no.
The documentation for the SOUNDEX function describes how the function works. The doc says, that the algorithm (from the 1910'1 and '20s!) "is English-biased and is less useful for languages other than English." The doc also describes exactly how it works. It basically encodes every word and then declares that words that have the same encoding "sound alike."
To work with a non-English language, you would have to determine a similar encoding that encodes the sounds (based on characters) in the desired language. You would then write an FCMP function that implements the algorithm and can be used in the DATA step. Theoretically, this is possible for languages that use single-byte characters (such as Spanish and French), but it becomes much more difficult for multiple-byte character sets such as Arabic, Korean, or Chinese. In practice, I suspect this would be extremely difficult.
What is the actual business problem you are trying to solve by using SOUNDEX? If you explain that then maybe there is another way that might work.
If there are only 420 unique business descriptions, then manually building a lookup table to group them shouldn't take too long. I would put these in a spreadsheet and then type in the groups. Once complete, just import the the spreadsheet back into a SAS dataset and join it back to your original data.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.
Find more tutorials on the SAS Users YouTube channel.