BookmarkSubscribeRSS Feed
Grandhi4
Calcite | Level 5

Hi,

I have some two million records (email id's) in that some of wrong email addresses like ('@' missing, '.com' missing, '.net' missing,...) and each email address their own character length...so now my question is 1) How to identify the 'error' email id's ?

2) How to delete 'error' email id's ?

3) How to make a two different data sets for 'error ones' and 'non errors' ?

any one can please help the logic (code).

Thanks,

Suresh

2 REPLIES 2
RickM
Fluorite | Level 6

For finding valid adresses I think perl regular expressions functions would be a good way (prxparse, prxmatch).

You can output data to different datasets within the same data step.

data A B;

set C;

if condition then output A;

else output B;
run;

Good luck!

FriedEgg
SAS Employee

data eml;

input eml $20.;

cards;

valid@email.com

bad@email.sdglkjasd

b ad@space.net

notanemail.com

;

run;

data good bad;

set eml;

if ^prxmatch('/^\w[\w\.\-]*\w\@\w[\w\.\-]*\w(\.\w{2,4})$/',strip(eml)) then output bad;

  else output good;

run;

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 2 replies
  • 1489 views
  • 3 likes
  • 3 in conversation