Hi all,
i have a mailid dataset as follows with coloumn mail having mail id's and missing values and having invalid data.
rawindar
raw...
a...
;
Now i want to get valid mail id's and i want to delete all other invalid.How can i get it.Can anybody help me plz
Thanks&Regards
Rawindarreddy
Email address validation is normally (not only in SAS) done using a Regular Expression (RegEx). In SAS you can use functions prxparse() and prxmatch() to define and apply regular expressions.
Should you never have used regular expressions before then you will first need to read a bit about the topic in order to understand what it does.
I've just Googled with the following keywords: "perl regex email validation". This gives you links with regular expressions for email validation. Just use an appropriate one.
Here the SAS Regular Expression Tip Sheet: http://support.sas.com/rnd/base/datastep/perl_regexp/regexp-tip-sheet.pdf
You can find more about RegEx in the normal SAS documentation.
Considering it could be a daunting task if you have never done regex before, To follow up Patrick's suggestion and to get you started with ease, here is an example:
data have;
input emailid$50.;
cards;
aaikfhgiug
862495
akdjf@akfhd
aifghtd@ci_asdklf@co
;
data want;
set have;
if prxmatch("/[^@]+@[^@]+\.[^@]+/", emailid) = 1 then output;
run;
proc print;run;
Note: you may need to construct your own rules of being qualified as email id. The code above has adopted the following rules:
one or more non '@' character or digit
followed by
'@'
followed by
one or more non '@' character or digit
followed by
'.'
followed by
one or more non '@' character or digit
Regards,
Haikuo
hi hai.kuo
i didnt understand what that job of praxmatch.I didnt understand how it works and what happening here.Can you help me to understand this code.
Thanks &Regards
Rawindarreddy
prxmatch() is the SAS function used to test if a regular expression matches with a string.
A regular expression allows you to define a text pattern - and with prxmatch() you then test if a string matches this pattern.
The syntax for Regular Expressions is in the beginning a bit hard to understand. But Regular Expressions are really something useful and it's worth learning at least the basics.
Hi,
Like I said, it could be overwhelming if you have never touched RegEx, while it is very powerful and not as hard to learn as it seems. You could just start from learning prxmatch:
http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a002296115.htm
To have a solid understanding on the concept, you can get any book on Perl, the one I like is called 'perl black book', don't have it on hand, so can't get the exact name. Also, the cheat sheet provided in Patrick's link is very handy, but it has be after you know something about regex.
Also, Toby Dunn of SAS-L will have one book on SAS RegEX published (or maybe already ?). I suppose that would be a great book focusing on regex applications with SAS.
Good luck!
Haikuo
Hi,
Here is another good one:
data email_data; input email $50.;
datalines; user@example.com another_user@test.org invalid_email ;
run;
data matched_emails;
set email_data;
if prxmatch('/\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b/', email) > 0;
run;
Dear Rawindarreddy.,
hope this will help you.
data test;
input mailid $;
cards;
noid
.
.
rererrrrerer
;
run;
proc sql;
select * from test
where mailid contains '@';
quit;
May be something like this would give a bit better results:
where mailid like '_@%._'
Still not perfect and invalid mailaddresses can still pass the test (as said: RegEx is what's needed).
Thanks Patrick.I already tried it, its working but iam looking for any data step method instead of proc step.
thanks sanjeev,
But in case column contains values like 'abc@' means with out domain names we cant get values right.
Yes Peter its working iam looking for to solve this in data step.But i couldnt understand that above functions.
data want;
set have;
where mailid like '%@%.%';
run;
http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a000202951.htm
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.