I have to compare between 2 company name variables to match data. I am using the three functions in the title.
compged = compged(name,name2);
complev = complev(name,name2);
spedis = spedis(name,name2);
What are the minimum and maximum values for each function? and what does it mean? To my understanding, the lower the better. 0 means an exact match. However, COMPGED returns a very high number (600 or 1000) for a closely matched obs.
The attached photo shows some examples. The first 4 rows would be a match and the bottom 2 would not be a match. The 3 functions return very high numbers. I am aware that this is because of missing letters in variable NAME2 for the first 4 rows. But what I want to achieve is a return of a match for the first 4 rows, and no match for the last 2. However, using the values from the 3 functions cannot help to determine that.
What would be a better approach?
Some articles that might help:
From a quick look at your example you should seriously consider using the options to ignore case. Some of the results you get are being inflated because of case differences.
The documentation on the functions tells you what is considered and used to assign values. One note from the documentation is that Compged and Complev are faster than Spedis.
And Compged can work with Call Compcost so you set the rules for how much certain of the rules set though not an exercise for the faint of heart. This might be a serious advantage if you know a lot about some behaviors between the sources of the strings.
Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.
Register today!Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.