How can I label sequences of words in a text which are the names of things, such as person and company names, or locations. I'd like to start with a simple project----I have a list of fortune 1000 company names, a sample data set with texts such as "Acari had an accident outside Children's Place near central ave in May." I want to tokenize the text first, match the tokens with the list of 1000 company names and find the name (Children't Place), then replace it with string "company name". I also have a list of all American people names, a list of street suffix/abbreviation. And I'd like to replace all people name with "person name" and street name with '"street name". Ideally I want to find and replace any sensitive information: people name, company name, location, date, time, etc. with non-sensitive text strings. Any suggestion? Thanks!
... View more