SAS Programming

DATA Step, Macro, Functions and more
BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Vk_2
Obsidian | Level 7
DATA COMPONENT; infile datalines delimiter=',';  length FIRST $ 1000 FIRST_B $ 1000; INPUT FIRST $ FIRST_B $; DATALINES; Electric Component keyboard replacement, Keyboard inward component replacement Electric Component keyboard replacement, Monitor Component Replacement Electric Component keyboard replacement, Mouse component Electric Component keyboard replacement, Wire Replacement Electric Component keyboard replacement, PIN part ;  DATA Compged; SET COMPONENT; CALL COMPCOST('SWAP=', 5, 'P=', 0, 'INS=', 10,'DEL=',10,'APPEND=',5); First_COMPGED=COMPGED(FIRST, FIRST_B, 'iln'); RUN;

My data is like shown above. To find how many are common words between the two strings I broke down the strings to words in each column.

data split_words;
 set COMPGED;
 delims = ' ,.-!'; 
 array FIRST_B_WORDS[6] $15 FIRST_B1-FIRST_B6;
array FIRST_WORDS[6] $15 FIRST1-FIRST6;
 do i = 1 to 6;
  FIRST_B_WORDS[i] = scan(FIRST_B,i,",- ");
  FIRST_WORDS[i] = scan(FIRST,i,",- ");
  count_words_B=countw(FIRST_B, delims);
  count_words=countw(FIRST, delims);
 end;
 
 drop i delims;
run;

Now what I want to do is find how many words are same between FIRST_B1-FIRST_B6 and FIRST1-FIRST6.   Decrease the compged score depending on how many are same between the string  to improve the fuzzy logic-compged score to match 

as in example Electric Component keyboard replacement to Keyboard inward component replacement  as the lowest score suggesting best match.

 st_1.JPG 

I am using SAS EG-7.12

1 ACCEPTED SOLUTION

Accepted Solutions
andreas_lds
Jade | Level 19

You don't need to split the words, you can use the function countw, scan and findw to get the number of same words.

 

data work.same_words;
   set work.compged;

   length same i 8;
   drop i;

   same = 0;

   do i = 1 to countw(first_b);
      if findw(first, scan(first_b, i, ' '), ' ', 'sit') then do;
         same = same + 1;
      end;
   end;

   /* FIXME: adjust compged score */

run;

View solution in original post

1 REPLY 1
andreas_lds
Jade | Level 19

You don't need to split the words, you can use the function countw, scan and findw to get the number of same words.

 

data work.same_words;
   set work.compged;

   length same i 8;
   drop i;

   same = 0;

   do i = 1 to countw(first_b);
      if findw(first, scan(first_b, i, ' '), ' ', 'sit') then do;
         same = same + 1;
      end;
   end;

   /* FIXME: adjust compged score */

run;

sas-innovate-white.png

Special offer for SAS Communities members

Save $250 on SAS Innovate and get a free advance copy of the new SAS For Dummies book! Use the code "SASforDummies" to register. Don't miss out, May 6-9, in Orlando, Florida.

 

View the full agenda.

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 1 reply
  • 2775 views
  • 1 like
  • 2 in conversation