BookmarkSubscribeRSS Feed
Lisa-SAS
Calcite | Level 5

data aa;
prty = 'TSACLLC';
nm = 'ST';
match_score1 = compged(nm,prty,200);
match_score2 = compged(nm,prty);
run;

match_score1 returns value 200, and match_score2 is 70. I can't understand why match_score1 doesn't return the correct value of 70. Please help. Thanks!

 

5 REPLIES 5
Reeza
Super User

Do the results you get match with the documentation results?

https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.2/lefunctionsref/p1r4l9jwgatggtn1ko81fyjys4s7.h...

 

If you modify that code slightly to use the cutoff does it work as expected? 

 

data test;
   infile datalines missover;
   input String1 $char8. +1 String2 $char8. +1 Operation $40.;
   GED=compged(string1, string2);
   datalines;
baboon   baboon   match
baXboon  baboon   insert
baoon    baboon   delete
baXoon   baboon   replace
baboonX  baboon   append
baboo    baboon   truncate
babboon  baboon   double
babon    baboon   single
baobon   baboon   swap
bab oon  baboon   blank
bab,oon  baboon   punctuation
bXaoon   baboon   insert+delete
bXaYoon  baboon   insert+replace
bXoon    baboon   delete+replace
Xbaboon  baboon   finsert
aboon    baboon   trick question: swap+delete
Xaboon   baboon   freplace
axoon    baboon   fdelete+replace
axoo     baboon   fdelete+replace+truncate
axon     baboon   fdelete+replace+single
baby     baboon   replace+truncate*2
balloon  baboon   replace+insert
;

proc print data=test label;
   label GED='Generalized Edit Distance';
   var String1 String2 GED Operation;
run;
Lisa-SAS
Calcite | Level 5

Thank for your reply. I had read the document. But can't solve my questions. The true score should be 70. However if put cutoff be 200, the function returns 200 and doesn't return correct score. I found if put cutoff >=270 or remove cutoff, then the function returns correct value. Why?

 

data aa;
prty = 'TSACLLC';
nm = 'ST';
match_score1 = compged(nm,prty,270);
match_score2 = compged(nm,prty,200);
match_score3 = compged(nm,prty);
run;

 

prtynmmatch_score1match_score2match_score3
TSACLLCST7020070
ballardw
Super User

Interesting. If you increment a cutoff value such as with this:

data aa;
   prty = 'TSACLLC';
   nm = 'ST';
   do cutoff = 80 to 210;
      match_score1 = compged(nm,prty,cutoff);
      output;
   end;
run;

we get the Cutoff value for the score until cutoff=201, then it returns 70.

This is very likely subject to the exact values used though. I might try using a slightly larger cutoff as a work around.

Reeza
Super User
It does seem to depend on the values - seems like this one should get reported to Tech Support @Lisa-SAS probably worth opening up a support ticket.
Lisa-SAS
Calcite | Level 5

Thank you @ballardw. When I added cutoff (used 110 in my program) in the function, I wanted to see if the code can run faster. And it did, from hours to 30 mins, due to large data sets. If use a larger cutoff value, first the issue might still exist, second will decrease the efficiency.

 

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 485 views
  • 0 likes
  • 3 in conversation