Hi,
I have some two million records (email id's) in that some of wrong email addresses like ('@' missing, '.com' missing, '.net' missing,...) and each email address their own character length...so now my question is 1) How to identify the 'error' email id's ?
2) How to delete 'error' email id's ?
3) How to make a two different data sets for 'error ones' and 'non errors' ?
any one can please help the logic (code).
Thanks,
Suresh
For finding valid adresses I think perl regular expressions functions would be a good way (prxparse, prxmatch).
You can output data to different datasets within the same data step.
data A B;
set C;
if condition then output A;
else output B;
run;
Good luck!
data eml;
input eml $20.;
cards;
bad@email.sdglkjasd
notanemail.com
;
run;
data good bad;
set eml;
if ^prxmatch('/^\w[\w\.\-]*\w\@\w[\w\.\-]*\w(\.\w{2,4})$/',strip(eml)) then output bad;
else output good;
run;
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Still thinking about your presentation idea? The submission deadline has been extended to Friday, Nov. 14, at 11:59 p.m. ET.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.