BookmarkSubscribeRSS Feed
cmoore
Obsidian | Level 7

Hi,

 

Hopefully somebody can help me. I have 2 data tables. Table one has a singular column that contains a list of approx 10,000 business names such as "AstraZeneca Group", "Zurich Insurance Group", "Virgin Money PLC", etc. Table 2 contains a column that holds free text. From the column in table 2, I want to run a fuzzy match against the data stored in table 1. Therefore, if there are occurrences of "AstraZeneca Group" in table 2 in the free text column then create a new column "AstraZeneca" with a value of 1. Is there a slick and efficient way of doing this that can screen the free text column (in table 2) against a large list of variables? I am using Base SAS.

 

Thanks for your support.

 

Regards

Chris

 

2 REPLIES 2
thomp7050
Pyrite | Level 9

Here is a start.  I would try changing all to lower case and also consider soundex for misspellings.

 

data have;
infile cards delimiter=',' missover;
length ref $ 1000;
input ref $;
cards;
AstraZeneca Group,
MyGroup Rules
;
run;

data want;
infile cards delimiter=',' missover;  
length ref $ 1000;
input ref $;
cards;
I love the AstraZeneca Group!!!!!  It is so awesome!,
Oh yeah I totally love the AstraZeneca Group they rule!,
Have you ever tried MyGroup Rules?  It is sweeeeeeeet!
;

PROC SQL;
CREATE TABLE JOINS AS 
SELECT HAVE.REF AS REF1, WANT.REF AS REF2 FROM HAVE
LEFT JOIN WANT ON TRIM(WANT.REF) CONTAINS TRIM(HAVE.REF);
RUN;
art297
Opal | Level 21

There are two parts to your question. First, parsing nouns and noun phrases from text. I think you'll have to go outside of SAS for that part, namely to use one of the Natural Language Processing (NLP) packages. Take a look at: http://stackoverflow.com/questions/10974532/extracting-noun-phrases-from-a-text-file-using-stanford-...

 

As for the second part, I have found the compged function to be the most powerful for such tasks. You can find one example of how you can incorporate it in your code at: http://www.sascommunity.org/wiki/Expert_Panel_Solution_MWSUG_2013-Tabachneck

 

Art, CEO, AnalystFinder.com

 

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 2 replies
  • 1796 views
  • 0 likes
  • 3 in conversation