BookmarkSubscribeRSS Feed
AB1976
Calcite | Level 5

Good day I would like assistance in transforming this SQL code into a code suitable for SAS.

 

I am trying to identify email addresses that do not conform to the validation rules set out for a valid email address.

 
CASE 
WHEN [EMAIL_ADRES_X] LIKE '%_@__%.__%' AND PATINDEX('\b[a-zA-Z0-9._-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,6}\b', [EMAIL_ADRES_X]) = 0 THEN 1
ELSE 0
END AS email_conformity
FROM #T2 

 

4 REPLIES 4
quickbluefish
Barite | Level 11

I don't remember the syntax for the PRXMATCH function offhand, but you'd do something like this - just multiplying these two binary checks (>0 and =0) together to create a 0/1 variable called email_conformity.  Note that "email_address" is assumed to be a variable (column) in your input data and input data is called 'emailaddr':

 

data emailaddr;
set emailaddr;
email_conformity=(
    prxmatch(YOUR EMAIL REGEX, email_address)>0
    *
    prxmatch(YOUR OTHER REGEX, email_address)=0
    );
run;
PaigeMiller
Diamond | Level 26

It would be better if you explained the rules you want to follow, rather than providing PRX code, which I am unfamiliar with.

--
Paige Miller
Tom
Super User Tom
Super User

Are you asking how to translate the PATINDEX() function that is included as an SQL enhancement in your current SQL dialect into a SAS function? 

 

You might try PRXMATCH function.

 

Or are you asking what to do about those strange square brackets and hash mark?  For those I think you want to replace them with the actual names of your variables and dataset.

Patrick
Opal | Level 21

@AB1976 Something like below should work.

proc sql;
  select 
  CASE 
  WHEN 
      EMAIL_ADRES_X LIKE '%_@__%.__%' 
      AND prxmatch('/\b[a-zA-Z0-9\._-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,6}\b/i', strip(EMAIL_ADRES_X)) = 1 
      THEN 1
      ELSE 0
      END AS email_conformity
  FROM have
  ;
quit;

Like SQL Server patindex() SAS prxmatch() will return the starting position of a match or zero if it isn't found. 

From the looks of it your RegEx got some "issues" - especially the full stop that's not masked and though stands for any character.

 

Instead of inventing a email validation RegEx yourself I'd Google for it. Below one way how you could modify your current syntax.

data have;
  EMAIL_ADRES_X='firstname.lastname@somecompany.com'; output;
  EMAIL_ADRES_X='..@somecompany.com'; output;

run;

proc sql;
  select 
  CASE 
  WHEN 
    prxmatch('/^[a-z0-9][\w\.-]+@[\w\.-]+\.[a-z]{2,6}$/oi', strip(EMAIL_ADRES_X)) = 1 THEN 1
    ELSE 0
    END AS email_conformity
  FROM have
  ;
quit;