Solved: Finding how many same words exist between two strings.

Vk_2 · Posted 07-12-2018 04:16 PM

DATA COMPONENT;
 infile datalines delimiter=','; 
 length FIRST $ 1000 FIRST_B $ 1000;
 INPUT FIRST $ FIRST_B $;
 DATALINES;
 Electric Component keyboard replacement, Keyboard inward component replacement
 Electric Component keyboard replacement, Monitor Component Replacement
 Electric Component keyboard replacement, Mouse component
 Electric Component keyboard replacement, Wire Replacement
 Electric Component keyboard replacement, PIN part
 ;
 
 DATA Compged;
 SET COMPONENT;
 CALL COMPCOST('SWAP=', 5, 'P=', 0, 'INS=', 10,'DEL=',10,'APPEND=',5);
 First_COMPGED=COMPGED(FIRST, FIRST_B, 'iln');
 RUN;

My data is like shown above. To find how many are common words between the two strings I broke down the strings to words in each column.

data split_words;
 set COMPGED;
 delims = ' ,.-!'; 
 array FIRST_B_WORDS[6] $15 FIRST_B1-FIRST_B6;
array FIRST_WORDS[6] $15 FIRST1-FIRST6;
 do i = 1 to 6;
  FIRST_B_WORDS[i] = scan(FIRST_B,i,",- ");
  FIRST_WORDS[i] = scan(FIRST,i,",- ");
  count_words_B=countw(FIRST_B, delims);
  count_words=countw(FIRST, delims);
 end;
 
 drop i delims;
run;

Now what I want to do is find how many words are same between FIRST_B1-FIRST_B6 and FIRST1-FIRST6. Decrease the compged score depending on how many are same between the string to improve the fuzzy logic-compged score to match

as in example Electric Component keyboard replacement to Keyboard inward component replacement as the lowest score suggesting best match.

I am using SAS EG-7.12

andreas_lds · Posted 07-13-2018 01:08 AM

You don't need to split the words, you can use the function countw, scan and findw to get the number of same words.

data work.same_words;
   set work.compged;

   length same i 8;
   drop i;

   same = 0;

   do i = 1 to countw(first_b);
      if findw(first, scan(first_b, i, ' '), ' ', 'sit') then do;
         same = same + 1;
      end;
   end;

   /* FIXME: adjust compged score */

run;

View solution in original post

andreas_lds · Posted 07-13-2018 01:08 AM

You don't need to split the words, you can use the function countw, scan and findw to get the number of same words.

data work.same_words;
   set work.compged;

   length same i 8;
   drop i;

   same = 0;

   do i = 1 to countw(first_b);
      if findw(first, scan(first_b, i, ' '), ' ', 'sit') then do;
         same = same + 1;
      end;
   end;

   /* FIXME: adjust compged score */

run;

Finding how many same words exist between two strings.

Re: Finding how many same words exist between two strings.

Re: Finding how many same words exist between two strings.

Finding how many same words exist between two strings.

Re: Finding how many same words exist between two strings.

Re: Finding how many same words exist between two strings.

Registration is open

SAS Training: Just a Click Away