BookmarkSubscribeRSS Feed
nakulkothari
Calcite | Level 5

I have a huge list of email addresses. I want to determine number of dictionary words in each email address. Programming language I am using is SAS.

 

Ex - suppose the email addresses are as below. The output I require is - coolgirl@email.com --> 2 dictionary words - cool and girl angeldream@gmail.como --> 2 dictionary words - angel and dream

 

Can anyone suggest how to go about it.

4 REPLIES 4
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Whilst its easy enouhg to get a list of words off the net, my first search came up with this:

https://github.com/dwyl/english-words

 

The question is how are you going to lexicographically parse a text string to find words?  There are many combinations, different meanings, different spellings etc.  Just take your example: coolgirl, what if it was coolaid?  Two separate words, or the company name?  What about halfpipe, should it be half and pipe, or halfpipe?

 

I think your best bet would be to investigate text analytics if you really need to do this, although its another license:

http://www.sas.com/en_us/software/analytics/text-miner.html

PGStats
Opal | Level 21

Some word lists are available at https://sourceforge.net/projects/wordlist/files/latest/download?source=typ_redirect

 

Is this for targeted marketing? 

PG
nakulkothari
Calcite | Level 5
No, this is not for targeted marketing.

I am doing a project in which I need to determine number of dictionary words in the email handle.

I am stuck in the question. And don't know how to proceed
nakulkothari
Calcite | Level 5

No, this is not for targeted marketing.

I am doing a project in which I need to determine number of dictionary words in the email handle.

I am stuck in the question. And don't know how to proceed

Catch up on SAS Innovate 2026

Nearly 200 sessions are now available on demand with the SAS Innovate Digital Pass.

Explore Now →
How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 3066 views
  • 0 likes
  • 3 in conversation