BookmarkSubscribeRSS Feed
vasasuser
Calcite | Level 5

I have a dataset with a list of organization, company names (dataset A) such as walmart, AT & T, Children's Place, Target, a dataset with popular people names (dataset B), and a dataset consisting of text like "Harry met Sally at Walmart." (dataset C)

 

How can I match text string "Walmart" from dataset C with company name in dataset A and replace it with "comp_name"?

I'd also like to match and replace "Harry" and "Sally" from dataset C with "people_name".

 

Thanks!

5 REPLIES 5
Reeza
Super User

Do you need to account for :

Wal-Mart vs walmart vs WALMART vs Wal Mart vs wal mart

 

Please post more sample code and data if you want code for help.

 

Otherwise, consider searching on here using the terms "Fuzzy matching"

 

 

ballardw
Super User

Also what would the final result look like?

 

Do any of your company names happen to be peoples names? I can think of multiple stores where all or part of the name could show up in your person name data. How do intend to handle that?

vasasuser
Calcite | Level 5

Thank you all for replies and good questions that inspire me to think more.

 

First, I believe this is a fuzzy string match that ignore cases, blanks, etc.

 

Second, I think it's better to produce result like this:

Original text: "Harry met Sally at Walmart."

Editted text: "edit_name met edit_name at edit_name."

since there are cases when company names show up in person names.

 

The basic idea is to remove any people name, organization/company name, and ideally, location (street names or suffix/abbreation) from a sample text file. I built up a dictionary myself by compiling an excel file of company names and people names from census and other online sources, and then set up a sample text dataset with 10 records, 5 of which carry names and locations, 5 of which don't. I'd like to match the sample text dataset with my dictionaries and then replace them with "edit_name".

 

Any help on that?

Thanks!

 

 

 

 

Reeza
Super User

Quick way then, if you have a database to look into. 

 

Split the file into word components, using SCAN function to separate out the strings. 

 Result:

 

SentenceID Word

1 Harry

1 met

1 Sally

1 at

1 Walmart

2 Jack

2 and 

2 Jill

2 Went

2 Up 

2 a 

2 hill

 

Then merge this against your databases and categorize the components and then reform the data into sentences after if desired.

 

Also, if you have some money look into the Text Analytics available in SAS Enterprise Miner.

Peter_C
Rhodochrosite | Level 12
Rather than use the concept MERGE, think of solutions to this problem as a JOIN...
That leads you to PROC SQL with fuzzy logic in the ON conditions of the join

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 541 views
  • 0 likes
  • 4 in conversation