Hi~
To match COMPGED scores to 100 or less in legt join, I used compged function as below.
============================================================================
libname proj "D:\SASWORK";
proc sql; *Left join ccmmatch with s34match via cusip;
create table dualmatch
as select a.*, b.mgrname, b.fdate, b.shares, b.shrout2, compged (a.lender, b.mgrname, 'I') as match
from proj.ccmmatch a left join proj.s34match b
on (a.cusip=b.s34cusip1) and (compged (a.lender, b.mgrname) le 100) and (year(a.FacilityStartDate)=year(b.FDATE)) and (month(a.FacilityStartDate) between month(b.FDATE)-2 and month(b.FDATE));
quit;
============================================================================
There is no errors in this code. However, my COBPGED scores are much larger than 100 as follows.
I would really appreciate it if anyone could give me a help.
Thanks!
Hello @Smile1 ,
COMPGED costs are very quickly above 100.
Who has told you that you need to have a generalized edit distance below 100?
You can influence the COMPGED costs yourself with the COMPCOST function or the CALL COMPCOST routine. You can also use some modifiers in the COMPGED function to compare case insensitive and / or include / exclude blanks and quotation marks in the comparison.
There's probably nothing wrong with your code, but if you are above 100, you are above 100 (and no results may be returned from the join).
Kind regards,
Koen
You might be interested in this example:
data example; x='word'; y='wood'; z=compged(x,y); run;
Comparing two 4-letter words with a single letter different: Compged value is 100. It takes very little to exceed 100 with Compged scores.
The little I use this function for I generally look for scores less than 800 for only people names. Your variable name Lender makes one suspect that could be a company name with all sorts of potential for including punctuation like period, comma or semicolon and special characters like & (A&B Bank or such). I would seldom expect a "lender" and "mgrname", which I would guess to be "manager's name" or similar, to be very close with Compged scores.
Perhaps you need SPEDIS which generates smaller differences due to a different algorithm. Or accept a larger Compged score.
Have you looked the calculation of COMGED?
Inserting one character costs 200 points. A score lower than 100 is for almost identical strings, which your data clearly isn't have for the values shown.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.