I have a huge list of email addresses. I want to determine number of dictionary words in each email address. Programming language I am using is SAS.
Ex - suppose the email addresses are as below. The output I require is - coolgirl@email.com --> 2 dictionary words - cool and girl angeldream@gmail.como --> 2 dictionary words - angel and dream
Can anyone suggest how to go about it.
Whilst its easy enouhg to get a list of words off the net, my first search came up with this:
https://github.com/dwyl/english-words
The question is how are you going to lexicographically parse a text string to find words? There are many combinations, different meanings, different spellings etc. Just take your example: coolgirl, what if it was coolaid? Two separate words, or the company name? What about halfpipe, should it be half and pipe, or halfpipe?
I think your best bet would be to investigate text analytics if you really need to do this, although its another license:
Some word lists are available at https://sourceforge.net/projects/wordlist/files/latest/download?source=typ_redirect.
Is this for targeted marketing?
No, this is not for targeted marketing.
I am doing a project in which I need to determine number of dictionary words in the email handle.
I am stuck in the question. And don't know how to proceed
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.
Find more tutorials on the SAS Users YouTube channel.