data aa;
prty = 'TSACLLC';
nm = 'ST';
match_score1 = compged(nm,prty,200);
match_score2 = compged(nm,prty);
run;
match_score1 returns value 200, and match_score2 is 70. I can't understand why match_score1 doesn't return the correct value of 70. Please help. Thanks!
Do the results you get match with the documentation results?
If you modify that code slightly to use the cutoff does it work as expected?
data test;
infile datalines missover;
input String1 $char8. +1 String2 $char8. +1 Operation $40.;
GED=compged(string1, string2);
datalines;
baboon baboon match
baXboon baboon insert
baoon baboon delete
baXoon baboon replace
baboonX baboon append
baboo baboon truncate
babboon baboon double
babon baboon single
baobon baboon swap
bab oon baboon blank
bab,oon baboon punctuation
bXaoon baboon insert+delete
bXaYoon baboon insert+replace
bXoon baboon delete+replace
Xbaboon baboon finsert
aboon baboon trick question: swap+delete
Xaboon baboon freplace
axoon baboon fdelete+replace
axoo baboon fdelete+replace+truncate
axon baboon fdelete+replace+single
baby baboon replace+truncate*2
balloon baboon replace+insert
;
proc print data=test label;
label GED='Generalized Edit Distance';
var String1 String2 GED Operation;
run;
Thank for your reply. I had read the document. But can't solve my questions. The true score should be 70. However if put cutoff be 200, the function returns 200 and doesn't return correct score. I found if put cutoff >=270 or remove cutoff, then the function returns correct value. Why?
data aa;
prty = 'TSACLLC';
nm = 'ST';
match_score1 = compged(nm,prty,270);
match_score2 = compged(nm,prty,200);
match_score3 = compged(nm,prty);
run;
prty | nm | match_score1 | match_score2 | match_score3 |
TSACLLC | ST | 70 | 200 | 70 |
Interesting. If you increment a cutoff value such as with this:
data aa; prty = 'TSACLLC'; nm = 'ST'; do cutoff = 80 to 210; match_score1 = compged(nm,prty,cutoff); output; end; run;
we get the Cutoff value for the score until cutoff=201, then it returns 70.
This is very likely subject to the exact values used though. I might try using a slightly larger cutoff as a work around.
Thank you @ballardw. When I added cutoff (used 110 in my program) in the function, I wanted to see if the code can run faster. And it did, from hours to 30 mins, due to large data sets. If use a larger cutoff value, first the issue might still exist, second will decrease the efficiency.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.