I have a huge list of email addresses. I want to determine number of dictionary words in each email address. Programming language I am using is SAS.
Ex - suppose the email addresses are as below. The output I require is - coolgirl@email.com --> 2 dictionary words - cool and girl angeldream@gmail.como --> 2 dictionary words - angel and dream
Can anyone suggest how to go about it.
Whilst its easy enouhg to get a list of words off the net, my first search came up with this:
https://github.com/dwyl/english-words
The question is how are you going to lexicographically parse a text string to find words? There are many combinations, different meanings, different spellings etc. Just take your example: coolgirl, what if it was coolaid? Two separate words, or the company name? What about halfpipe, should it be half and pipe, or halfpipe?
I think your best bet would be to investigate text analytics if you really need to do this, although its another license:
Some word lists are available at https://sourceforge.net/projects/wordlist/files/latest/download?source=typ_redirect.
Is this for targeted marketing?
No, this is not for targeted marketing.
I am doing a project in which I need to determine number of dictionary words in the email handle.
I am stuck in the question. And don't know how to proceed
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.
Find more tutorials on the SAS Users YouTube channel.