Text mining and content categorization

Counting Dictionary words in email address

Reply
New Contributor
Posts: 3

Counting Dictionary words in email address

I have a huge list of email addresses. I want to determine number of dictionary words in each email address. Programming language I am using is SAS.

 

Ex - suppose the email addresses are as below. The output I require is - coolgirl@email.com --> 2 dictionary words - cool and girl angeldream@gmail.como --> 2 dictionary words - angel and dream

 

Can anyone suggest how to go about it.

Super User
Super User
Posts: 7,565

Re: Counting Dictionary words in email address

Whilst its easy enouhg to get a list of words off the net, my first search came up with this:

https://github.com/dwyl/english-words

 

The question is how are you going to lexicographically parse a text string to find words?  There are many combinations, different meanings, different spellings etc.  Just take your example: coolgirl, what if it was coolaid?  Two separate words, or the company name?  What about halfpipe, should it be half and pipe, or halfpipe?

 

I think your best bet would be to investigate text analytics if you really need to do this, although its another license:

http://www.sas.com/en_us/software/analytics/text-miner.html

Respected Advisor
Posts: 4,756

Re: Counting Dictionary words in email address

Some word lists are available at https://sourceforge.net/projects/wordlist/files/latest/download?source=typ_redirect

 

Is this for targeted marketing? 

PG
New Contributor
Posts: 3

Re: Counting Dictionary words in email address

No, this is not for targeted marketing.

I am doing a project in which I need to determine number of dictionary words in the email handle.

I am stuck in the question. And don't know how to proceed
New Contributor
Posts: 3

Re: Counting Dictionary words in email address

No, this is not for targeted marketing.

I am doing a project in which I need to determine number of dictionary words in the email handle.

I am stuck in the question. And don't know how to proceed

Ask a Question
Discussion stats
  • 4 replies
  • 480 views
  • 0 likes
  • 3 in conversation