02-08-2017 10:04 AM
How can I label sequences of words in a text which are the names of things, such as person and company names, or locations. There are many other softwares that can do name entity recognition (NER). Can SAS text miner or contextual analysis offer anything like this?
I'd like to start with a simple project----I have a list of fortune 1000 company names, a sample data set with texts such as
"Acari had an accident outside Children's Place near central ave in May."
I want to tokenize the text first, match the tokens with the list of 1000 company names and find the name (Children't Place), then replace it with string "company name".
I also have a list of all American people names, a list of street suffix/abbreviation. And I'd like to replace all people name with "person name" and street name with '"street name".
Ideally I want to find and replace any sensitive information: people name, company name, location, date, time, etc. with non-sensitive text strings.
Any suggestion? Thanks!