data names ;
length company_name $ 50;
infile cards dlm='~' ;
input company_name $ country $;
cards ;
lenevo pvt ltd~usa
pvt lene~usa
harish industries~india
institute of harish technology~india
bata showroom ltd~usa
multi theature of bata's showroom~india
run;
I have created one data set like above . I have some records like company names so how to identify the
same sound spelling words for example : see first and second record 1--lenevo 2---lene new variable='lene'.
for your reference see final dataset output:
company_name country match_spelling
lenevo pvt ltd usa lene
pvt lene usa lene
harish industries india harish
institute of harish technology india harish
bata showroom ltd usa bata
multi theature of bata's showroom india bata
You can look at the documentation for the SOUNDEX function.
That creates an "encoded" version of the string that can be compared to an encoded version of another string to see if they are the same
data example; string = 'banana'; str2 = 'Bannnannna'; a=soundex(string); b=soundex(str2); put a= b=; run;
Read the documentation for a bit of how the algorithm works.
Cross language sounds are likely not going to be consistent as only one language's "sounds" have rules for encoding.
Since your specific example includes things that do not sound alike because the number of syllables changes: lene lenevo it may be that you want more of a "closeness of similar spelling" which would be functions COMPGED, COMPLEV or SPEDIS that compare the spelling and score the difference. Smaller scores being closer in similar spelling.
data example; word1 = 'lene'; word2 = 'lenevo'; a = compged(word1,word2); b = complev(word1,word2); c = spedis (word1,word2); put a= b= c=; run;
You would provide additional rules for "how close is close enough".
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.