Is there a more condensed way to write this? I want to know if I can do the "keep" group all in one line instead of needing to duplicate it for each variation of email domain.
CASE
WHEN t1.NVL_COUNTRY_GROUP = 'NULL' and t1.EMAIL_ADDRESS LIKE '%.COM' THEN 'KEEP'
WHEN t1.NVL_COUNTRY_GROUP = 'NULL' and t1.EMAIL_ADDRESS LIKE '%.MIL' THEN 'KEEP'
WHEN t1.NVL_COUNTRY_GROUP = 'NULL' and t1.EMAIL_ADDRESS LIKE '%.BIZ' THEN 'KEEP'
WHEN t1.NVL_COUNTRY_GROUP = 'NULL' and t1.EMAIL_ADDRESS LIKE '%.NET' THEN 'KEEP'
WHEN t1.NVL_COUNTRY_GROUP = 'NULL' and t1.EMAIL_ADDRESS LIKE '%.ORG' THEN 'KEEP'
WHEN t1.NVL_COUNTRY_GROUP = 'NULL' and t1.EMAIL_ADDRESS LIKE '%.EDU' THEN 'KEEP'
WHEN t1.NVL_COUNTRY_GROUP = 'NULL' and t1.EMAIL_ADDRESS LIKE '%.GOV' THEN 'KEEP'
WHEN t1.NVL_COUNTRY_GROUP = 'NULL' and t1.EMAIL_ADDRESS LIKE '%.PRO' THEN 'KEEP'
WHEN t1.NVL_COUNTRY_GROUP = 'NULL' and t1.EMAIL_ADDRESS LIKE '%.INFO' THEN 'KEEP'
WHEN t1.NVL_COUNTRY_GROUP = 'NULL' and t1.EMAIL_ADDRESS LIKE '%.US' THEN 'KEEP'
WHEN t1.NVL_COUNTRY_GROUP = 'NULL' and t1.EMAIL_ADDRESS LIKE '%.CA' THEN 'KEEP'
WHEN t1.NVL_COUNTRY_GROUP = 'NULL' and t1.EMAIL_ADDRESS LIKE '%.AERO' THEN 'KEEP'
WHEN t1.NVL_COUNTRY_GROUP = 'NULL' and t1.EMAIL_ADDRESS LIKE '%.ARPA' THEN 'KEEP'
WHEN t1.NVL_COUNTRY_GROUP = 'NULL' and t1.EMAIL_ADDRESS LIKE '%.ASIA' THEN 'KEEP'
WHEN t1.NVL_COUNTRY_GROUP = 'NULL' and t1.EMAIL_ADDRESS LIKE '%.CAT' THEN 'KEEP'
WHEN t1.NVL_COUNTRY_GROUP = 'NULL' and t1.EMAIL_ADDRESS LIKE '%.COOP' THEN 'KEEP'
WHEN t1.NVL_COUNTRY_GROUP = 'NULL' and t1.EMAIL_ADDRESS LIKE '%.INT' THEN 'KEEP'
WHEN t1.NVL_COUNTRY_GROUP = 'NULL' and t1.EMAIL_ADDRESS LIKE '%.JOBS' THEN 'KEEP'
WHEN t1.NVL_COUNTRY_GROUP = 'NULL' and t1.EMAIL_ADDRESS LIKE '%.MOBI' THEN 'KEEP'
WHEN t1.NVL_COUNTRY_GROUP = 'NULL' and t1.EMAIL_ADDRESS LIKE '%.MUSEUM' THEN 'KEEP'
WHEN t1.NVL_COUNTRY_GROUP = 'NULL' and t1.EMAIL_ADDRESS LIKE '%.NAME' THEN 'KEEP'
WHEN t1.NVL_COUNTRY_GROUP = 'NULL' and t1.EMAIL_ADDRESS LIKE '%.TEL' THEN 'KEEP'
WHEN t1.NVL_COUNTRY_GROUP = 'NULL' and t1.EMAIL_ADDRESS LIKE '%.TRAVEL' THEN 'KEEP'
WHEN t1.NVL_COUNTRY_GROUP = 'NULL' and t1.EMAIL_ADDRESS LIKE '%.UM' THEN 'KEEP'
WHEN t1.NVL_COUNTRY_GROUP = 'POPULATED_COUNTRY' THEN 'KEEP'
ELSE 'REMOVE'
END
case when t2.nvl_country_group = 'NULL' and upcase(scan(t1.email_address, 2, '.@')) in ('COM', 'MIL', BIZ', 'NET', .... , 'UM') then 'KEEP'
else 'REMOVE'
end;
Couple of other options:
1. Create a data set with the list of domains you want to include in the strings and use that instead of typing it into your code. It makes it a bit easier to maintain in the long run.
2. From #1 you could create a format that would assign the KEEP/REMOVE values
3. Test NULL condition only once instead of every line, nest the CASE or IF/THEN statements if necessary.
@agoldberg2018 wrote:
Is there a more condensed way to write this? I want to know if I can do the "keep" group all in one line instead of needing to duplicate it for each variation of email domain.
CASE
WHEN t1.NVL_COUNTRY_GROUP = 'NULL' and t1.EMAIL_ADDRESS LIKE '%.COM' THEN 'KEEP'
WHEN t1.NVL_COUNTRY_GROUP = 'NULL' and t1.EMAIL_ADDRESS LIKE '%.MIL' THEN 'KEEP'
WHEN t1.NVL_COUNTRY_GROUP = 'NULL' and t1.EMAIL_ADDRESS LIKE '%.BIZ' THEN 'KEEP'
WHEN t1.NVL_COUNTRY_GROUP = 'NULL' and t1.EMAIL_ADDRESS LIKE '%.NET' THEN 'KEEP'
WHEN t1.NVL_COUNTRY_GROUP = 'NULL' and t1.EMAIL_ADDRESS LIKE '%.ORG' THEN 'KEEP'
WHEN t1.NVL_COUNTRY_GROUP = 'NULL' and t1.EMAIL_ADDRESS LIKE '%.EDU' THEN 'KEEP'
WHEN t1.NVL_COUNTRY_GROUP = 'NULL' and t1.EMAIL_ADDRESS LIKE '%.GOV' THEN 'KEEP'
WHEN t1.NVL_COUNTRY_GROUP = 'NULL' and t1.EMAIL_ADDRESS LIKE '%.PRO' THEN 'KEEP'
WHEN t1.NVL_COUNTRY_GROUP = 'NULL' and t1.EMAIL_ADDRESS LIKE '%.INFO' THEN 'KEEP'
WHEN t1.NVL_COUNTRY_GROUP = 'NULL' and t1.EMAIL_ADDRESS LIKE '%.US' THEN 'KEEP'
WHEN t1.NVL_COUNTRY_GROUP = 'NULL' and t1.EMAIL_ADDRESS LIKE '%.CA' THEN 'KEEP'
WHEN t1.NVL_COUNTRY_GROUP = 'NULL' and t1.EMAIL_ADDRESS LIKE '%.AERO' THEN 'KEEP'
WHEN t1.NVL_COUNTRY_GROUP = 'NULL' and t1.EMAIL_ADDRESS LIKE '%.ARPA' THEN 'KEEP'
WHEN t1.NVL_COUNTRY_GROUP = 'NULL' and t1.EMAIL_ADDRESS LIKE '%.ASIA' THEN 'KEEP'
WHEN t1.NVL_COUNTRY_GROUP = 'NULL' and t1.EMAIL_ADDRESS LIKE '%.CAT' THEN 'KEEP'
WHEN t1.NVL_COUNTRY_GROUP = 'NULL' and t1.EMAIL_ADDRESS LIKE '%.COOP' THEN 'KEEP'
WHEN t1.NVL_COUNTRY_GROUP = 'NULL' and t1.EMAIL_ADDRESS LIKE '%.INT' THEN 'KEEP'
WHEN t1.NVL_COUNTRY_GROUP = 'NULL' and t1.EMAIL_ADDRESS LIKE '%.JOBS' THEN 'KEEP'
WHEN t1.NVL_COUNTRY_GROUP = 'NULL' and t1.EMAIL_ADDRESS LIKE '%.MOBI' THEN 'KEEP'
WHEN t1.NVL_COUNTRY_GROUP = 'NULL' and t1.EMAIL_ADDRESS LIKE '%.MUSEUM' THEN 'KEEP'
WHEN t1.NVL_COUNTRY_GROUP = 'NULL' and t1.EMAIL_ADDRESS LIKE '%.NAME' THEN 'KEEP'
WHEN t1.NVL_COUNTRY_GROUP = 'NULL' and t1.EMAIL_ADDRESS LIKE '%.TEL' THEN 'KEEP'
WHEN t1.NVL_COUNTRY_GROUP = 'NULL' and t1.EMAIL_ADDRESS LIKE '%.TRAVEL' THEN 'KEEP'
WHEN t1.NVL_COUNTRY_GROUP = 'NULL' and t1.EMAIL_ADDRESS LIKE '%.UM' THEN 'KEEP'
WHEN t1.NVL_COUNTRY_GROUP = 'POPULATED_COUNTRY' THEN 'KEEP'
ELSE 'REMOVE'
END
case when t2.nvl_country_group = 'NULL' and upcase(scan(t1.email_address, 2, '.@')) in ('COM', 'MIL', BIZ', 'NET', .... , 'UM') then 'KEEP'
else 'REMOVE'
end;
Couple of other options:
1. Create a data set with the list of domains you want to include in the strings and use that instead of typing it into your code. It makes it a bit easier to maintain in the long run.
2. From #1 you could create a format that would assign the KEEP/REMOVE values
3. Test NULL condition only once instead of every line, nest the CASE or IF/THEN statements if necessary.
@agoldberg2018 wrote:
Is there a more condensed way to write this? I want to know if I can do the "keep" group all in one line instead of needing to duplicate it for each variation of email domain.
CASE
WHEN t1.NVL_COUNTRY_GROUP = 'NULL' and t1.EMAIL_ADDRESS LIKE '%.COM' THEN 'KEEP'
WHEN t1.NVL_COUNTRY_GROUP = 'NULL' and t1.EMAIL_ADDRESS LIKE '%.MIL' THEN 'KEEP'
WHEN t1.NVL_COUNTRY_GROUP = 'NULL' and t1.EMAIL_ADDRESS LIKE '%.BIZ' THEN 'KEEP'
WHEN t1.NVL_COUNTRY_GROUP = 'NULL' and t1.EMAIL_ADDRESS LIKE '%.NET' THEN 'KEEP'
WHEN t1.NVL_COUNTRY_GROUP = 'NULL' and t1.EMAIL_ADDRESS LIKE '%.ORG' THEN 'KEEP'
WHEN t1.NVL_COUNTRY_GROUP = 'NULL' and t1.EMAIL_ADDRESS LIKE '%.EDU' THEN 'KEEP'
WHEN t1.NVL_COUNTRY_GROUP = 'NULL' and t1.EMAIL_ADDRESS LIKE '%.GOV' THEN 'KEEP'
WHEN t1.NVL_COUNTRY_GROUP = 'NULL' and t1.EMAIL_ADDRESS LIKE '%.PRO' THEN 'KEEP'
WHEN t1.NVL_COUNTRY_GROUP = 'NULL' and t1.EMAIL_ADDRESS LIKE '%.INFO' THEN 'KEEP'
WHEN t1.NVL_COUNTRY_GROUP = 'NULL' and t1.EMAIL_ADDRESS LIKE '%.US' THEN 'KEEP'
WHEN t1.NVL_COUNTRY_GROUP = 'NULL' and t1.EMAIL_ADDRESS LIKE '%.CA' THEN 'KEEP'
WHEN t1.NVL_COUNTRY_GROUP = 'NULL' and t1.EMAIL_ADDRESS LIKE '%.AERO' THEN 'KEEP'
WHEN t1.NVL_COUNTRY_GROUP = 'NULL' and t1.EMAIL_ADDRESS LIKE '%.ARPA' THEN 'KEEP'
WHEN t1.NVL_COUNTRY_GROUP = 'NULL' and t1.EMAIL_ADDRESS LIKE '%.ASIA' THEN 'KEEP'
WHEN t1.NVL_COUNTRY_GROUP = 'NULL' and t1.EMAIL_ADDRESS LIKE '%.CAT' THEN 'KEEP'
WHEN t1.NVL_COUNTRY_GROUP = 'NULL' and t1.EMAIL_ADDRESS LIKE '%.COOP' THEN 'KEEP'
WHEN t1.NVL_COUNTRY_GROUP = 'NULL' and t1.EMAIL_ADDRESS LIKE '%.INT' THEN 'KEEP'
WHEN t1.NVL_COUNTRY_GROUP = 'NULL' and t1.EMAIL_ADDRESS LIKE '%.JOBS' THEN 'KEEP'
WHEN t1.NVL_COUNTRY_GROUP = 'NULL' and t1.EMAIL_ADDRESS LIKE '%.MOBI' THEN 'KEEP'
WHEN t1.NVL_COUNTRY_GROUP = 'NULL' and t1.EMAIL_ADDRESS LIKE '%.MUSEUM' THEN 'KEEP'
WHEN t1.NVL_COUNTRY_GROUP = 'NULL' and t1.EMAIL_ADDRESS LIKE '%.NAME' THEN 'KEEP'
WHEN t1.NVL_COUNTRY_GROUP = 'NULL' and t1.EMAIL_ADDRESS LIKE '%.TEL' THEN 'KEEP'
WHEN t1.NVL_COUNTRY_GROUP = 'NULL' and t1.EMAIL_ADDRESS LIKE '%.TRAVEL' THEN 'KEEP'
WHEN t1.NVL_COUNTRY_GROUP = 'NULL' and t1.EMAIL_ADDRESS LIKE '%.UM' THEN 'KEEP'
WHEN t1.NVL_COUNTRY_GROUP = 'POPULATED_COUNTRY' THEN 'KEEP'
ELSE 'REMOVE'
END
@Reeza can you explain to me what
SCAN(t1.EMAIL_ADDRESS, 2, '.@')
means?
It actually didn't work so I am wondering if the .@ is wrong.. is there a link you know of I can reference for a deep explanation of this?
Thanks for your help!
SCAN() is trying to get the component of the email.
Emails are usually in the form of :
xyz@domain.TLD
@scan() says that the components can be separated by the fields in the third parameter, @ and dot (.)
2 says to take the second one.
XYX - Domain - TLD
So 2 would return the domain portion.
Looking back at your question you actually want the last portion so you can change it to 3.
If you have addresses of the form:
ABC.XYZ@DOMAIN.TLD you can use -1 to get the last component instead.
scan(t1.email_address, -1, '.@')
@agoldberg2018 wrote:
@Reeza can you explain to me what
SCAN(t1.EMAIL_ADDRESS, 2, '.@')
means?
It actually didn't work so I am wondering if the .@ is wrong.. is there a link you know of I can reference for a deep explanation of this?
Thanks for your help!
Yes! The -1 one makes the most sense because of people who separate words in the email address like firstname.lastname@abc.com
Thank you again for taking the time to explain this to me! I greatly appreciate the knowledge.
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Check out this tutorial series to learn how to build your own steps in SAS Studio.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.