I have a huge list of email addresses. I want to determine number of dictionary words in each email address. Programming language I am using is SAS.
Ex - suppose the email addresses are as below. The output I require is - coolgirl@email.com --> 2 dictionary words - cool and girl angeldream@gmail.como --> 2 dictionary words - angel and dream
Can anyone suggest how to go about it.
Whilst its easy enouhg to get a list of words off the net, my first search came up with this:
https://github.com/dwyl/english-words
The question is how are you going to lexicographically parse a text string to find words? There are many combinations, different meanings, different spellings etc. Just take your example: coolgirl, what if it was coolaid? Two separate words, or the company name? What about halfpipe, should it be half and pipe, or halfpipe?
I think your best bet would be to investigate text analytics if you really need to do this, although its another license:
Some word lists are available at https://sourceforge.net/projects/wordlist/files/latest/download?source=typ_redirect.
Is this for targeted marketing?
No, this is not for targeted marketing.
I am doing a project in which I need to determine number of dictionary words in the email handle.
I am stuck in the question. And don't know how to proceed
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.
Find more tutorials on the SAS Users YouTube channel.