BookmarkSubscribeRSS Feed
rawindar
Calcite | Level 5

Hi all,

i have a mailid dataset as follows  with coloumn mail  having mail id's and missing values and having invalid data.

   mail

    rawindar

  

    raw...

    rawindar@gmail.com

    a...

    abc@ymail.com

     ;

Now i want to get  valid mail id's  and i want to delete all other invalid.How can i get it.Can anybody help me plz

Thanks&Regards

Rawindarreddy

13 REPLIES 13
Patrick
Opal | Level 21

Email address validation is normally (not only in SAS) done using a Regular Expression (RegEx). In SAS you can use functions prxparse() and prxmatch() to define and apply regular expressions.

Should you never have used regular expressions before then you will first need to read a bit about the topic in order to understand what it does.

I've just Googled with the following keywords: "perl regex email validation". This gives you links with regular expressions for email validation. Just use an appropriate one.

Here the SAS Regular Expression Tip Sheet:  http://support.sas.com/rnd/base/datastep/perl_regexp/regexp-tip-sheet.pdf

You can find more about RegEx in the normal SAS documentation.

Haikuo
Onyx | Level 15

Considering it could be a daunting task if you have never done regex before, To follow up Patrick's suggestion and to get you started with ease, here is an example:

data have;

input emailid$50.;

cards;

aaikfhgiug

akdjf@akdfh.com

862495

akdjf@akfhd

aifghtd@ci_asdklf@co

af_$@yahoo.com

aksdh89345@gmail.com

adhf@yahoo.co.au

;

data want;

set have;

if prxmatch("/[^@]+@[^@]+\.[^@]+/", emailid) = 1 then output;

run;

proc print;run;

Note: you may need to construct your own rules of being qualified as email id. The code above has adopted the following rules:

one or more non '@' character or digit

followed by

'@'

followed by

one or more non '@' character or digit

followed by

'.'

followed by

one or more non '@' character or digit

Regards,

Haikuo

rawindar
Calcite | Level 5

hi hai.kuo

            i didnt understand what that job of praxmatch.I didnt understand how it works and what happening here.Can you help me to understand this code.

Thanks &Regards

Rawindarreddy

Patrick
Opal | Level 21

prxmatch() is the SAS function used to test if a regular expression matches with a string.

A regular expression allows you to define a text pattern - and with prxmatch() you then test if a string matches this pattern.

The syntax for Regular Expressions is in the beginning a bit hard to understand. But Regular Expressions are really something useful and it's worth learning at least the basics.

Haikuo
Onyx | Level 15

Hi,

Like I said, it could be overwhelming if you have never touched RegEx, while it is very powerful and not as hard to learn as it seems. You could just start from learning prxmatch:

http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a002296115.htm

To have a solid understanding on the concept, you can get any book on Perl, the one I like is called 'perl black book', don't have it on hand, so can't get the exact name. Also, the cheat sheet provided in Patrick's link is very handy, but it has be after you know something about regex.

Also, Toby Dunn of SAS-L will have one book on SAS RegEX published (or maybe already ?). I suppose that would be a great book focusing on regex applications with SAS.

Good luck!

Haikuo

AlinaValencia
Calcite | Level 5

Hi, 

Here is another good one:

data email_data; input email $50.;

datalines; user@example.com another_user@test.org invalid_email ;

run;

 

data matched_emails;

set email_data;

if prxmatch('/\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b/', email) > 0;

run;

kuridisanjeev
Quartz | Level 8

Dear Rawindarreddy.,

hope this will help you.

data test;

input mailid $;

cards;

aaa@yahoo.com

noid

.

.

xdfg@gmail.com

ddd@yahoo.com

ggg@gmail.com

rererrrrerer

;

run;

proc sql;

select * from test

where mailid contains '@';

quit;

Patrick
Opal | Level 21

May be something like this would give a bit better results:

where mailid like '_@%._'

Still not perfect and invalid mailaddresses can still pass the test (as said: RegEx is what's needed).

rawindar
Calcite | Level 5

Thanks Patrick.I already tried it, its working but iam looking for any data step method instead of proc step.

rawindar
Calcite | Level 5

thanks sanjeev,

                     But in case column contains values like 'abc@' means with out domain names we cant get values right.

Peter_C
Rhodochrosite | Level 12

Rawindar

have you tried the code suggested by Patrick?

where mailid like '_@%._'

rawindar
Calcite | Level 5

Yes Peter its working iam looking for to solve this in data step.But i couldnt understand that above functions.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 13 replies
  • 7406 views
  • 1 like
  • 6 in conversation