Text mining and content categorization

How to match and find a word or phrase in a text with a list of names (person or company)?

Reply
New Contributor
Posts: 4

How to match and find a word or phrase in a text with a list of names (person or company)?

How can I label sequences of words in a text which are the names of things, such as person and company names, or locations. There are many other softwares that can do name entity recognition (NER). Can SAS text miner or contextual analysis offer anything like this?

 

I'd like to start with a simple project----I have a list of fortune 1000 company names, a sample data set with texts such as

"Acari had an accident outside Children's Place near central ave in May."

 

I want to tokenize the text first, match the tokens with the list of 1000 company names and find the name (Children't Place), then replace it with string "company name". 

 

I also have a list of all American people names, a list of street suffix/abbreviation. And I'd like to replace all people name with "person name" and street name with '"street name".

 

Ideally I want to find and replace any sensitive information: people name, company name, location, date, time, etc. with non-sensitive text strings. 

 

Any suggestion? Thanks!

 

Ask a Question
Discussion stats
  • 0 replies
  • 147 views
  • 0 likes
  • 1 in conversation