04-24-2017 07:05 AM
Hopefully somebody can help me. I have 2 data tables. Table one has a singular column that contains a list of approx 10,000 business names such as "AstraZeneca Group", "Zurich Insurance Group", "Virgin Money PLC", etc. Table 2 contains a column that holds free text. From the column in table 2, I want to run a fuzzy match against the data stored in table 1. Therefore, if there are occurrences of "AstraZeneca Group" in table 2 in the free text column then create a new column "AstraZeneca" with a value of 1. Is there a slick and efficient way of doing this that can screen the free text column (in table 2) against a large list of variables? I am using Base SAS.
Thanks for your support.
04-24-2017 07:59 AM
Here is a start. I would try changing all to lower case and also consider soundex for misspellings.
data have; infile cards delimiter=',' missover; length ref $ 1000; input ref $; cards; AstraZeneca Group, MyGroup Rules ; run; data want; infile cards delimiter=',' missover; length ref $ 1000; input ref $; cards; I love the AstraZeneca Group!!!!! It is so awesome!, Oh yeah I totally love the AstraZeneca Group they rule!, Have you ever tried MyGroup Rules? It is sweeeeeeeet! ; PROC SQL; CREATE TABLE JOINS AS SELECT HAVE.REF AS REF1, WANT.REF AS REF2 FROM HAVE LEFT JOIN WANT ON TRIM(WANT.REF) CONTAINS TRIM(HAVE.REF); RUN;
04-24-2017 08:30 AM
There are two parts to your question. First, parsing nouns and noun phrases from text. I think you'll have to go outside of SAS for that part, namely to use one of the Natural Language Processing (NLP) packages. Take a look at: http://stackoverflow.com/questions/10974532/extracting-noun-phrases-from-a-text-file-using-stanford-...
As for the second part, I have found the compged function to be the most powerful for such tasks. You can find one example of how you can incorporate it in your code at: http://www.sascommunity.org/wiki/Expert_Panel_Solution_MWSUG_2013-Tabachneck
Art, CEO, AnalystFinder.com