Hi,
Hopefully somebody can help me. I have 2 data tables. Table one has a singular column that contains a list of approx 10,000 business names such as "AstraZeneca Group", "Zurich Insurance Group", "Virgin Money PLC", etc. Table 2 contains a column that holds free text. From the column in table 2, I want to run a fuzzy match against the data stored in table 1. Therefore, if there are occurrences of "AstraZeneca Group" in table 2 in the free text column then create a new column "AstraZeneca" with a value of 1. Is there a slick and efficient way of doing this that can screen the free text column (in table 2) against a large list of variables? I am using Base SAS.
Thanks for your support.
Regards
Chris
Here is a start. I would try changing all to lower case and also consider soundex for misspellings.
data have;
infile cards delimiter=',' missover;
length ref $ 1000;
input ref $;
cards;
AstraZeneca Group,
MyGroup Rules
;
run;
data want;
infile cards delimiter=',' missover;
length ref $ 1000;
input ref $;
cards;
I love the AstraZeneca Group!!!!! It is so awesome!,
Oh yeah I totally love the AstraZeneca Group they rule!,
Have you ever tried MyGroup Rules? It is sweeeeeeeet!
;
PROC SQL;
CREATE TABLE JOINS AS
SELECT HAVE.REF AS REF1, WANT.REF AS REF2 FROM HAVE
LEFT JOIN WANT ON TRIM(WANT.REF) CONTAINS TRIM(HAVE.REF);
RUN;
There are two parts to your question. First, parsing nouns and noun phrases from text. I think you'll have to go outside of SAS for that part, namely to use one of the Natural Language Processing (NLP) packages. Take a look at: http://stackoverflow.com/questions/10974532/extracting-noun-phrases-from-a-text-file-using-stanford-...
As for the second part, I have found the compged function to be the most powerful for such tasks. You can find one example of how you can incorporate it in your code at: http://www.sascommunity.org/wiki/Expert_Panel_Solution_MWSUG_2013-Tabachneck
Art, CEO, AnalystFinder.com
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.