Hello SAS community:
I have one table named 'table1' with a column named 'X1' and a second table, 'table2', with a column named 'X2'. I want to combine a row from 'table1' and 'table2' when the value of 'X1' from 'table1' is a 'close' match to the value of 'X2' in 'table2'.
For example, if the value of 'X1' from a row in 'table1' is the string 'abc.&.123', and the value of 'X2' from a row in 'table2' is 'abc_and_123', then I would like to return a row which is the combined rows from 'table1' and 'table2' on this 'close' match between variables 'X1' from 'table1' and 'X2' from 'table2'.
I can use the LIKE operator to locate each row from each table as follows:
/*FOR table1:*/ PROC SQL; SELECT X1 FROM table1 WHERE X1 LIKE 'abc%123'; QUIT; /*FOR table2:*/ PROC SQL; SELECT X2 FROM table2 WHERE X2 LIKE 'abc%123'; QUIT;
Basically I would like to do a JOIN for the two tables on a 'close' match. I wonder if the LIKE operator can achieve this, or perhaps there is another strategy that would allow me to do the same?
Thanks.
this is a link to a paper that seems to have solved my problem. thanks again to @PaigeMiller and @SASKiwi for their helpful feedback and commentary.
Search this forum (and google the rest of the internet) for "fuzzy matching".
thanks for the tip @PaigeMiller
You haven't posted much detail on the types of fuzzy matching you want. The only example you have given looks like it is made-up. It would be helpful if you could provide more examples. Typically DATA steps are much better for fuzzy matching.
thanks for your comment @SASKiwi
this is a link to a paper that seems to have solved my problem. thanks again to @PaigeMiller and @SASKiwi for their helpful feedback and commentary.
@Varrelle - No problem, glad to hear you have figured it out for yourself. Please bear in mind that using a range of techniques is best for fuzzy matching. Start by doing exact matches, then use fuzzy techniques on the remaining data.
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.