I have to compare between 2 company name variables to match data. I am using the three functions in the title.
compged = compged(name,name2);
complev = complev(name,name2);
spedis = spedis(name,name2);
What are the minimum and maximum values for each function? and what does it mean? To my understanding, the lower the better. 0 means an exact match. However, COMPGED returns a very high number (600 or 1000) for a closely matched obs.
The attached photo shows some examples. The first 4 rows would be a match and the bottom 2 would not be a match. The 3 functions return very high numbers. I am aware that this is because of missing letters in variable NAME2 for the first 4 rows. But what I want to achieve is a return of a match for the first 4 rows, and no match for the last 2. However, using the values from the 3 functions cannot help to determine that.
What would be a better approach?
Some articles that might help:
From a quick look at your example you should seriously consider using the options to ignore case. Some of the results you get are being inflated because of case differences.
The documentation on the functions tells you what is considered and used to assign values. One note from the documentation is that Compged and Complev are faster than Spedis.
And Compged can work with Call Compcost so you set the rules for how much certain of the rules set though not an exercise for the faint of heart. This might be a serious advantage if you know a lot about some behaviors between the sources of the strings.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.