04-22-2016 07:16 AM
I have a huge list of email addresses. I want to determine number of dictionary words in each email address. Programming language I am using is SAS.
Ex - suppose the email addresses are as below. The output I require is - email@example.com --> 2 dictionary words - cool and girl firstname.lastname@example.org --> 2 dictionary words - angel and dream
Can anyone suggest how to go about it.
04-22-2016 08:06 AM
Whilst its easy enouhg to get a list of words off the net, my first search came up with this:
The question is how are you going to lexicographically parse a text string to find words? There are many combinations, different meanings, different spellings etc. Just take your example: coolgirl, what if it was coolaid? Two separate words, or the company name? What about halfpipe, should it be half and pipe, or halfpipe?
I think your best bet would be to investigate text analytics if you really need to do this, although its another license:
04-23-2016 03:53 PM
Some word lists are available at https://sourceforge.net/projects/wordlist/files/latest/download?source=typ_redirect.
Is this for targeted marketing?
04-28-2016 03:19 AM
04-28-2016 05:41 AM
No, this is not for targeted marketing.
I am doing a project in which I need to determine number of dictionary words in the email handle.
I am stuck in the question. And don't know how to proceed