BookmarkSubscribeRSS Feed
Smile1
Calcite | Level 5

Hi~

 

To match COMPGED scores to 100 or less in legt join, I used compged function as below.

 

============================================================================
libname proj "D:\SASWORK";
proc sql; *Left join ccmmatch with s34match via cusip;
create table dualmatch
as select a.*, b.mgrname, b.fdate, b.shares, b.shrout2, compged (a.lender, b.mgrname, 'I') as match
from proj.ccmmatch a left join proj.s34match b
on (a.cusip=b.s34cusip1) and (compged (a.lender, b.mgrname) le 100) and (year(a.FacilityStartDate)=year(b.FDATE)) and (month(a.FacilityStartDate) between month(b.FDATE)-2 and month(b.FDATE));
quit;
============================================================================

 

There is no errors in this code. However, my COBPGED scores are much larger than 100 as follows.

match.PNG

I would really appreciate it if anyone could give me a help.

 

Thanks!

6 REPLIES 6
sbxkoenk
SAS Super FREQ

Hello @Smile1 ,

 

COMPGED costs are very quickly above 100.

Who has told you that you need to have a generalized edit distance below 100?

 

You can influence the COMPGED costs yourself with the COMPCOST function or the CALL COMPCOST routine. You can also use some modifiers in the COMPGED function to compare case insensitive and / or include / exclude blanks and quotation marks in the comparison.

There's probably nothing wrong with your code, but if you are above 100, you are above 100 (and no results may be returned from the join).

Kind regards,
Koen

Smile1
Calcite | Level 5
Hi Koen,

Thank you for your response. Yes, COMPGED costs are very quickly above 100.
But I think maybe there is a way to standardize COMPGED costs based on 0 - 100.
It is no doubt that we can easily recognize which name is close to the original if we can measure COMPGED costs based on 0 - 100, rather than 0 - infinite.

I will try different ways as well. Thank you for your help!
ballardw
Super User

You might be interested in this example:

data example;
   x='word';
   y='wood';
   z=compged(x,y);
run;

Comparing two 4-letter words with a single letter different: Compged value is 100. It takes very little to exceed 100 with Compged scores.

The little I use this function for I generally look for scores less than 800 for only people names. Your variable name Lender makes one suspect that could be a company name with all sorts of potential for including punctuation like period, comma or semicolon and special characters like & (A&B Bank or such). I would seldom expect a "lender" and "mgrname", which I would guess to be "manager's name" or similar, to be very close with Compged scores.

 

Perhaps you need SPEDIS which generates smaller differences due to a different algorithm. Or accept a larger Compged score.

Smile1
Calcite | Level 5
Hi Ballardw,

Thank you for your response. Yes, it takes very little to exceed 100 with Compged scores..
But I think maybe there is a way to standardize COMPGED costs based on 0 - 100.
It is no doubt that we can easily recognize which name is close to the original if we can measure COMPGED costs based on 0 - 100, rather than 0 - infinite.

Maybe I need to learn how to use SPEDIS if I can't find a way to standardize COMPGED costs based on 0 - 100.
Thank you for your help!
ChrisNZ
Tourmaline | Level 20

Have you looked the calculation of COMGED?

https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/lefunctionsref/p1r4l9jwgatggtn1ko81fyjys4s7.h...

Inserting one character costs 200 points. A score lower than 100 is for almost identical strings, which your data clearly isn't have for the values shown.

Smile1
Calcite | Level 5
Hi ChrisNZ,

Thank you for your response. Yes, A score lower than 100 is for almost identical strings....
But I think maybe there is a way to standardize COMPGED costs based on 0 - 100.
It is no doubt that we can easily recognize which name is close to the original if we can measure COMPGED costs based on 0 - 100, rather than 0 - infinite.

I will try different ways as well. Thank you for your help!

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 507 views
  • 0 likes
  • 4 in conversation