BookmarkSubscribeRSS Feed
thanikondharish
Calcite | Level 5

data names ;
length company_name $ 50;
infile cards dlm='~' ;
input company_name $ country $;
cards ;
lenevo pvt ltd~usa
pvt lene~usa
harish industries~india
institute of harish technology~india
bata showroom ltd~usa
multi theature of bata's showroom~india
run;

I have created one data set like above . I have some records like company names so how to identify the 

same sound spelling words for example : see first and second record 1--lenevo  2---lene new variable='lene'.

for your reference see final dataset output:

company_name                         country        match_spelling

lenevo pvt ltd                                usa                lene
pvt lene                                        usa                 lene
harish industries                          india               harish
institute of harish technology      india                harish
bata showroom ltd                        usa                bata
multi theature of bata's showroom  india             bata

1 REPLY 1
ballardw
Super User

You can look at the documentation for the SOUNDEX function.

 

That creates an "encoded" version of the string that can be compared to an encoded version of another string to see if they are the same

 

data example;
   string = 'banana';
   str2   = 'Bannnannna';
   a=soundex(string);
   b=soundex(str2);
   put  a= b=;
run;

Read the documentation for a bit of how the algorithm works.

Cross language sounds are likely not going to be consistent as only one language's "sounds" have rules for encoding.

 

Since your specific example includes things that do not sound alike because the number of syllables changes: lene lenevo it may be that you want more of a "closeness of similar spelling" which would be functions COMPGED, COMPLEV or SPEDIS that compare the spelling and score the difference. Smaller scores being closer in similar spelling.

 

data example;
   word1 = 'lene';
   word2 = 'lenevo';
   a = compged(word1,word2);
   b = complev(word1,word2);
   c = spedis (word1,word2);
   put a= b= c=;
run;

You would provide additional rules for "how close is close enough".

SAS Innovate 2025: Register Today!

 

Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.


Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 1 reply
  • 544 views
  • 0 likes
  • 2 in conversation