topic Re: Fuzzy Matching in SAS Programming

Fuzzy Matching

cmoore — Mon, 24 Apr 2017 11:05:02 GMT

Hi,

Hopefully somebody can help me. I have 2 data tables. Table one has a singular column that contains a list of approx 10,000 business names such as "AstraZeneca Group", "Zurich Insurance Group", "Virgin Money PLC", etc. Table 2 contains a column that holds free text. From the column in table 2, I want to run a fuzzy match against the data stored in table 1. Therefore, if there are occurrences of "AstraZeneca Group" in table 2 in the free text column then create a new column "AstraZeneca" with a value of 1. Is there a slick and efficient way of doing this that can screen the free text column (in table 2) against a large list of variables? I am using Base SAS.

Thanks for your support.

Regards

Chris

Re: Fuzzy Matching

thomp7050 — Mon, 24 Apr 2017 11:59:02 GMT

Here is a start. I would try changing all to lower case and also consider soundex for misspellings.

data have;
infile cards delimiter=',' missover;
length ref $ 1000;
input ref $;
cards;
AstraZeneca Group,
MyGroup Rules
;
run;

data want;
infile cards delimiter=',' missover;  
length ref $ 1000;
input ref $;
cards;
I love the AstraZeneca Group!!!!!  It is so awesome!,
Oh yeah I totally love the AstraZeneca Group they rule!,
Have you ever tried MyGroup Rules?  It is sweeeeeeeet!
;

PROC SQL;
CREATE TABLE JOINS AS 
SELECT HAVE.REF AS REF1, WANT.REF AS REF2 FROM HAVE
LEFT JOIN WANT ON TRIM(WANT.REF) CONTAINS TRIM(HAVE.REF);
RUN;

Re: Fuzzy Matching

art297 — Mon, 24 Apr 2017 12:30:02 GMT

There are two parts to your question. First, parsing nouns and noun phrases from text. I think you'll have to go outside of SAS for that part, namely to use one of the Natural Language Processing (NLP) packages. Take a look at: http://stackoverflow.com/questions/10974532/extracting-noun-phrases-from-a-text-file-using-stanford-typed-parser

As for the second part, I have found the compged function to be the most powerful for such tasks. You can find one example of how you can incorporate it in your code at: http://www.sascommunity.org/wiki/Expert_Panel_Solution_MWSUG_2013-Tabachneck

Art, CEO, AnalystFinder.com