DATA Step, Macro, Functions and more

how to get valid mail ids from a coloumn

Reply
Contributor
Posts: 70

how to get valid mail ids from a coloumn

Hi all,

i have a mailid dataset as follows  with coloumn mail  having mail id's and missing values and having invalid data.

   mail

    rawindar

  

    raw...

    rawindar@gmail.com

    a...

    abc@ymail.com

     ;

Now i want to get  valid mail id's  and i want to delete all other invalid.How can i get it.Can anybody help me plz

Thanks&Regards

Rawindarreddy

Respected Advisor
Posts: 3,892

Re: how to get valid data

Email address validation is normally (not only in SAS) done using a Regular Expression (RegEx). In SAS you can use functions prxparse() and prxmatch() to define and apply regular expressions.

Should you never have used regular expressions before then you will first need to read a bit about the topic in order to understand what it does.

I've just Googled with the following keywords: "perl regex email validation". This gives you links with regular expressions for email validation. Just use an appropriate one.

Here the SAS Regular Expression Tip Sheet:  http://support.sas.com/rnd/base/datastep/perl_regexp/regexp-tip-sheet.pdf

You can find more about RegEx in the normal SAS documentation.

Respected Advisor
Posts: 3,124

Re: how to get valid data

Considering it could be a daunting task if you have never done regex before, To follow up Patrick's suggestion and to get you started with ease, here is an example:

data have;

input emailid$50.;

cards;

aaikfhgiug

akdjf@akdfh.com

862495

akdjf@akfhd

aifghtd@ci_asdklf@co

af_$@yahoo.com

aksdh89345@gmail.com

adhf@yahoo.co.au

;

data want;

set have;

if prxmatch("/[^@]+@[^@]+\.[^@]+/", emailid) = 1 then output;

run;

proc print;run;

Note: you may need to construct your own rules of being qualified as email id. The code above has adopted the following rules:

one or more non '@' character or digit

followed by

'@'

followed by

one or more non '@' character or digit

followed by

'.'

followed by

one or more non '@' character or digit

Regards,

Haikuo

Contributor
Posts: 70

Re: how to get valid data

hi hai.kuo

            i didnt understand what that job of praxmatch.I didnt understand how it works and what happening here.Can you help me to understand this code.

Thanks &Regards

Rawindarreddy

Respected Advisor
Posts: 3,892

Re: how to get valid data

prxmatch() is the SAS function used to test if a regular expression matches with a string.

A regular expression allows you to define a text pattern - and with prxmatch() you then test if a string matches this pattern.

The syntax for Regular Expressions is in the beginning a bit hard to understand. But Regular Expressions are really something useful and it's worth learning at least the basics.

Respected Advisor
Posts: 3,124

Re: how to get valid data

Hi,

Like I said, it could be overwhelming if you have never touched RegEx, while it is very powerful and not as hard to learn as it seems. You could just start from learning prxmatch:

http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a002296115.htm

To have a solid understanding on the concept, you can get any book on Perl, the one I like is called 'perl black book', don't have it on hand, so can't get the exact name. Also, the cheat sheet provided in Patrick's link is very handy, but it has be after you know something about regex.

Also, Toby Dunn of SAS-L will have one book on SAS RegEX published (or maybe already ?). I suppose that would be a great book focusing on regex applications with SAS.

Good luck!

Haikuo

Super Contributor
Posts: 276

Re: how to get valid mail ids from a coloumn

Dear Rawindarreddy.,

hope this will help you.

data test;

input mailid $;

cards;

aaa@yahoo.com

noid

.

.

xdfg@gmail.com

ddd@yahoo.com

ggg@gmail.com

rererrrrerer

;

run;

proc sql;

select * from test

where mailid contains '@';

quit;

Respected Advisor
Posts: 3,892

Re: how to get valid mail ids from a coloumn

May be something like this would give a bit better results:

where mailid like '_@%._'

Still not perfect and invalid mailaddresses can still pass the test (as said: RegEx is what's needed).

Contributor
Posts: 70

Re: how to get valid mail ids from a coloumn

Thanks Patrick.I already tried it, its working but iam looking for any data step method instead of proc step.

Contributor
Posts: 70

Re: how to get valid mail ids from a coloumn

thanks sanjeev,

                     But in case column contains values like 'abc@' means with out domain names we cant get values right.

Valued Guide
Posts: 2,175

Re: how to get valid mail ids from a coloumn

Rawindar

have you tried the code suggested by Patrick?

where mailid like '_@%._'

Contributor
Posts: 70

Re: how to get valid mail ids from a coloumn

Yes Peter its working iam looking for to solve this in data step.But i couldnt understand that above functions.

Respected Advisor
Posts: 3,892

Re: how to get valid mail ids from a coloumn

Ask a Question
Discussion stats
  • 12 replies
  • 2272 views
  • 0 likes
  • 5 in conversation