<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: SAS EG to Clean Email Addresses in SAS Enterprise Guide</title>
    <link>https://communities.sas.com/t5/SAS-Enterprise-Guide/SAS-EG-to-Clean-Email-Addresses/m-p/304702#M20672</link>
    <description>&lt;P&gt;Fair statement- without combing through, manually, hundreds of thousands of rows of data, do you have a solution that can help me out?&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I also realized that I will have other situations like .gov, .ca, .edu....so perhaps the question is more to remove non alphabetic characters at the end of the email string, regardless of email info.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Make sense?&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Fri, 14 Oct 2016 15:25:12 GMT</pubDate>
    <dc:creator>tessa_h</dc:creator>
    <dc:date>2016-10-14T15:25:12Z</dc:date>
    <item>
      <title>SAS EG to Clean Email Addresses</title>
      <link>https://communities.sas.com/t5/SAS-Enterprise-Guide/SAS-EG-to-Clean-Email-Addresses/m-p/304693#M20669</link>
      <description>&lt;P&gt;Hi All,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Forgive me as I am a new SAS EG user (version 7.1); I have been assigned a data scrubbing task and it's been requested that I use SAS EG in order to do so.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I've been moving forward fairly well; however, I've encountered a problem that I cannot seem to intuitively resolve. I have several data sets which all have email addresses. Some of my email addresses contain junk characters which I have used the compress function to remove.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Unfortunately, I also have email addresses where I'd like to remove some characters based on where in the email address they are positioned.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Here are 2 examples:&lt;/P&gt;&lt;P&gt;Email Address: ..johndoe@gmail.com&lt;/P&gt;&lt;P&gt;Email Address: johndoe@gmail.com..&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Obviously you can see what I need to do.&amp;nbsp;&lt;/P&gt;&lt;P&gt;I'd like to use an advanced expression to&lt;/P&gt;&lt;P&gt;1. Remove all non-alphabetic characters before the first alphabetic character in the email string&lt;/P&gt;&lt;P&gt;2. Remove all characters after ".com"&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thoughts?&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks in advance,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Tessa&lt;/P&gt;</description>
      <pubDate>Fri, 14 Oct 2016 14:56:44 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Enterprise-Guide/SAS-EG-to-Clean-Email-Addresses/m-p/304693#M20669</guid>
      <dc:creator>tessa_h</dc:creator>
      <dc:date>2016-10-14T14:56:44Z</dc:date>
    </item>
    <item>
      <title>Re: SAS EG to Clean Email Addresses</title>
      <link>https://communities.sas.com/t5/SAS-Enterprise-Guide/SAS-EG-to-Clean-Email-Addresses/m-p/304700#M20671</link>
      <description>&lt;P&gt;Before blindly going after "everything after .com" you may want to verify that you have nothing like&lt;/P&gt;
&lt;P&gt;&lt;A href="mailto:bill@this.companyname.com" target="_blank"&gt;bill@this.companyname.com&lt;/A&gt; in your data.&lt;/P&gt;</description>
      <pubDate>Fri, 14 Oct 2016 15:22:30 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Enterprise-Guide/SAS-EG-to-Clean-Email-Addresses/m-p/304700#M20671</guid>
      <dc:creator>ballardw</dc:creator>
      <dc:date>2016-10-14T15:22:30Z</dc:date>
    </item>
    <item>
      <title>Re: SAS EG to Clean Email Addresses</title>
      <link>https://communities.sas.com/t5/SAS-Enterprise-Guide/SAS-EG-to-Clean-Email-Addresses/m-p/304702#M20672</link>
      <description>&lt;P&gt;Fair statement- without combing through, manually, hundreds of thousands of rows of data, do you have a solution that can help me out?&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I also realized that I will have other situations like .gov, .ca, .edu....so perhaps the question is more to remove non alphabetic characters at the end of the email string, regardless of email info.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Make sense?&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 14 Oct 2016 15:25:12 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Enterprise-Guide/SAS-EG-to-Clean-Email-Addresses/m-p/304702#M20672</guid>
      <dc:creator>tessa_h</dc:creator>
      <dc:date>2016-10-14T15:25:12Z</dc:date>
    </item>
    <item>
      <title>Re: SAS EG to Clean Email Addresses</title>
      <link>https://communities.sas.com/t5/SAS-Enterprise-Guide/SAS-EG-to-Clean-Email-Addresses/m-p/304711#M20673</link>
      <description>&lt;P&gt;using regular expressions probably better then below... what about numbers at beginning of email address like "123test@yahoo.com"? &amp;nbsp;If numbers need to be kept replace 'anyalpha' with 'anyalnum'.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;data email_1;&lt;BR /&gt;length email $40;&lt;BR /&gt;infile cards;&lt;BR /&gt;input email $;&lt;BR /&gt;cards;&lt;BR /&gt;..johndoe@gmail.com&lt;BR /&gt;johndoe@gmail.com..&lt;BR /&gt;;&lt;BR /&gt;run;&lt;/P&gt;
&lt;P&gt;proc sql;&lt;BR /&gt;create table email_1a as select&lt;BR /&gt; t1.email,&lt;BR /&gt;reverse(substr(reverse(trim(substr(t1.email,anyalpha(t1.email),length(t1.email) - anyalpha(t1.email)+1))),anyalpha(reverse(trim(substr(t1.email,anyalpha(t1.email),length(t1.email) - anyalpha(t1.email)+1)))),length(reverse(trim(substr(t1.email,anyalpha(t1.email),length(t1.email) - anyalpha(t1.email)+1)))) - anyalpha(reverse(trim(substr(t1.email,anyalpha(t1.email),length(t1.email) - anyalpha(t1.email)+1))))+1)) as email_corr&lt;BR /&gt;from email_1 t1&lt;BR /&gt;;&lt;BR /&gt;quit;&lt;/P&gt;</description>
      <pubDate>Fri, 14 Oct 2016 16:33:18 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Enterprise-Guide/SAS-EG-to-Clean-Email-Addresses/m-p/304711#M20673</guid>
      <dc:creator>cjinsf</dc:creator>
      <dc:date>2016-10-14T16:33:18Z</dc:date>
    </item>
    <item>
      <title>Re: SAS EG to Clean Email Addresses</title>
      <link>https://communities.sas.com/t5/SAS-Enterprise-Guide/SAS-EG-to-Clean-Email-Addresses/m-p/304715#M20674</link>
      <description>&lt;P&gt;As a start you could use the compress function, like below, this will remove all characters, except alphabetic, digits and punctuation anywhere in your email. Maybe one should also check what makes up a valid email address, see here &lt;A href="https://en.wikipedia.org/wiki/Email_address" target="_blank"&gt;https://en.wikipedia.org/wiki/Email_address&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Maybe there are Regular expressions that check for valid email addresses as well. You could use those with the PRX... functions.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Also have a look at this discussion&lt;/P&gt;
&lt;P&gt;&lt;A href="https://communities.sas.com/t5/SAS-Procedures/validate-email-address/m-p/37459#U37459" target="_blank"&gt;https://communities.sas.com/t5/SAS-Procedures/validate-email-address/m-p/37459#U37459&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data have;
  length email newEmail $ 256;
  email = cats("090a0d"x, "sugus-sugus_sugus.1234@sugus.com.edu", "01"x);

  newEmail = compress(email, "" , "adpk");
  putlog _all_;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Bruno&lt;/P&gt;</description>
      <pubDate>Fri, 14 Oct 2016 16:32:35 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Enterprise-Guide/SAS-EG-to-Clean-Email-Addresses/m-p/304715#M20674</guid>
      <dc:creator>BrunoMueller</dc:creator>
      <dc:date>2016-10-14T16:32:35Z</dc:date>
    </item>
    <item>
      <title>Re: SAS EG to Clean Email Addresses</title>
      <link>https://communities.sas.com/t5/SAS-Enterprise-Guide/SAS-EG-to-Clean-Email-Addresses/m-p/304751#M20676</link>
      <description>&lt;P&gt;Thanks, Bruno, for your advice...&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;Couple of things- I'm already using the compress function to remove some characters. This has been successful so far...&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Second, the link to the discussion has an article in it- the link goes to an error page. Not too useful &lt;span class="lia-unicode-emoji" title=":disappointed_face:"&gt;😞&lt;/span&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;A colleague of mine pointed me to &lt;A href="https://heuristically.wordpress.com/2013/12/03/email-address-normalization-in-sas/" target="_self"&gt;this article&lt;/A&gt;, however, as a new to SAS user, it will take some time to fully understand and implement how this ought to work for the project I am working on.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;To be perfectly honest as well, I'm also new to SQL and sadly, I've been tossed into the fire. Data scrubbing isn't a normal job function of mine and it sort of landed on my plate. However, I'm always keen to add more skill sets...so please forgive my lack of knowledge as I get through this...I am well aware that I'm in over my head.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 14 Oct 2016 18:22:55 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Enterprise-Guide/SAS-EG-to-Clean-Email-Addresses/m-p/304751#M20676</guid>
      <dc:creator>tessa_h</dc:creator>
      <dc:date>2016-10-14T18:22:55Z</dc:date>
    </item>
    <item>
      <title>Re: SAS EG to Clean Email Addresses</title>
      <link>https://communities.sas.com/t5/SAS-Enterprise-Guide/SAS-EG-to-Clean-Email-Addresses/m-p/304754#M20677</link>
      <description>&lt;P&gt;I appreciate the perspective; it's always helpful to get some insight to a problem. However, I don't appreciate an implication that I'm "blindly" going into a dataset without a variety of considerations.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks&lt;/P&gt;</description>
      <pubDate>Fri, 14 Oct 2016 18:27:09 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Enterprise-Guide/SAS-EG-to-Clean-Email-Addresses/m-p/304754#M20677</guid>
      <dc:creator>tessa_h</dc:creator>
      <dc:date>2016-10-14T18:27:09Z</dc:date>
    </item>
    <item>
      <title>Re: SAS EG to Clean Email Addresses</title>
      <link>https://communities.sas.com/t5/SAS-Enterprise-Guide/SAS-EG-to-Clean-Email-Addresses/m-p/304762#M20678</link>
      <description>&lt;P&gt;I don't think&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/13884"&gt;@ballardw﻿&lt;/a&gt;&amp;nbsp;meant anything by that, other than to be helpful. &amp;nbsp;As someone who also works with e-mail addresses, I'm always surprised by the number of variations caused by "subdomains" in a single organization. &amp;nbsp;E-mail addresses have an&amp;nbsp;@ symbol and end with a "dot something", but beyond that they seem to defy your typical expected formulas.&lt;/P&gt;</description>
      <pubDate>Fri, 14 Oct 2016 19:01:38 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Enterprise-Guide/SAS-EG-to-Clean-Email-Addresses/m-p/304762#M20678</guid>
      <dc:creator>ChrisHemedinger</dc:creator>
      <dc:date>2016-10-14T19:01:38Z</dc:date>
    </item>
    <item>
      <title>Re: SAS EG to Clean Email Addresses</title>
      <link>https://communities.sas.com/t5/SAS-Enterprise-Guide/SAS-EG-to-Clean-Email-Addresses/m-p/304766#M20679</link>
      <description>That's one of the downfalls of communicating via a forum like this- no tone, no facial expression, etc...regardless of the intention of the communication, it's critically important to consider how a message can be received. Not to mention that when I have used forums like this for help in the past, there seems to be a population of elitist types who enjoy pointing out the errors that others have made...just to enhance a grandiose sense of self. To avoid that type of issue, I included my level of knowledge using the program; I am the first to admit that I don't know what I don't know...I anticipate that moving forward with additional questions as they come up, I am sure to encounter these personality types. But to your point, it's pretty amazing the amount of considerations...overwhelming, in fact...and as I slowly remove layer after layer to this half a million row problem, I discover a new consideration...ie .com is one thing...but then...com, .edu, .gov, etc etc etc....there is no single easy solution. I appreciate help in this frustrating process that is not unique to me. &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;</description>
      <pubDate>Fri, 14 Oct 2016 19:11:26 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Enterprise-Guide/SAS-EG-to-Clean-Email-Addresses/m-p/304766#M20679</guid>
      <dc:creator>tessa_h</dc:creator>
      <dc:date>2016-10-14T19:11:26Z</dc:date>
    </item>
    <item>
      <title>Re: SAS EG to Clean Email Addresses</title>
      <link>https://communities.sas.com/t5/SAS-Enterprise-Guide/SAS-EG-to-Clean-Email-Addresses/m-p/304769#M20680</link>
      <description>&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/110159"&gt;@tessa_h&lt;/a&gt; wrote:&lt;BR /&gt;...&amp;nbsp;when I have used forums like this for help in the past, there seems to be a population of elitist types who enjoy pointing out the errors that others have made...just to enhance a grandiose sense of self.&amp;nbsp;&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;Welcome to the SAS Support Communities -- most of the experts here answer with humility, and a genuine desire to help. &amp;nbsp;Once in a while there is a language/culture barrier (it's a global community), but if you assume that people have the best intentions, you shouldn't be disappointed.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Regarding your challenge, &lt;A href="http://www.lexjansen.com/wuss/2004/data_warehousing/c_dwdb_taming_your_charac_p2.pdf" target="_self"&gt;this paper might help&lt;/A&gt;.&lt;/P&gt;</description>
      <pubDate>Fri, 14 Oct 2016 19:21:32 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Enterprise-Guide/SAS-EG-to-Clean-Email-Addresses/m-p/304769#M20680</guid>
      <dc:creator>ChrisHemedinger</dc:creator>
      <dc:date>2016-10-14T19:21:32Z</dc:date>
    </item>
    <item>
      <title>Re: SAS EG to Clean Email Addresses</title>
      <link>https://communities.sas.com/t5/SAS-Enterprise-Guide/SAS-EG-to-Clean-Email-Addresses/m-p/304800#M20681</link>
      <description>&lt;P&gt;You've definitely been tossed in. I just looked up the rules for valid versus invalid email address strings, and it's pretty complicated.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Here's a reference to the rules:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;A href="https://en.wikipedia.org/wiki/Email_address" target="_self"&gt;https://en.wikipedia.org/wiki/Email_address&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I'll see what I can do over the next few days...right now, I'm under a tight deadline. Regular expressions are the way to go...note that if you can find a regular expression example in Perl, it should translate well to SAS.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Tom&lt;/P&gt;</description>
      <pubDate>Fri, 14 Oct 2016 21:08:32 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Enterprise-Guide/SAS-EG-to-Clean-Email-Addresses/m-p/304800#M20681</guid>
      <dc:creator>TomKari</dc:creator>
      <dc:date>2016-10-14T21:08:32Z</dc:date>
    </item>
    <item>
      <title>Re: SAS EG to Clean Email Addresses</title>
      <link>https://communities.sas.com/t5/SAS-Enterprise-Guide/SAS-EG-to-Clean-Email-Addresses/m-p/304812#M20682</link>
      <description>&lt;P&gt;I did not mean anything negative about the "blind" comment. We have new posters here with very wide differences in experience. It is not uncommon to see a "desired result" phrasing that may not do what is actually intended as specified. Often the statement is incomplete or misses a boundary condition such that the suggestions made by community members does exactly what was requested it is not what the poster actually intended.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Often a reminder to look at all of the data helps clear up rules specifications.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I do sympathize as I have spent lots of time scrubbing data in a fairly large range of topics. One of my favorite examples is before a survey we provided the data recorders instructions about how to enter expected names. One example International Buisness Machines or IBM in any form was to be entered as "IBM". At this point take a short break and think about how many ways that may have been actually entered into the data.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;If your guess was over 10 you may be getting paranoid enough to be a good data scrubber. The actual count was 18 different forms with the most entertaining being I&amp;gt;B&amp;gt;M&amp;gt;. Which one will note immediately is the result of holding the shift key down and typing a period after each letter. Note that the original instructions specifically did NOT include a period. Other spellings involved random shifts of capitalization combined with one or more periods after letters (not always all 3 letters).&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;So when I saw a request similar to "trim everything after .com" I thought it may be appropriate to remind other valid values may exist. And your response about .gov and such showed that you picked up on the issue quickly.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 17 Oct 2016 18:53:34 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Enterprise-Guide/SAS-EG-to-Clean-Email-Addresses/m-p/304812#M20682</guid>
      <dc:creator>ballardw</dc:creator>
      <dc:date>2016-10-17T18:53:34Z</dc:date>
    </item>
    <item>
      <title>Re: SAS EG to Clean Email Addresses</title>
      <link>https://communities.sas.com/t5/SAS-Enterprise-Guide/SAS-EG-to-Clean-Email-Addresses/m-p/304825#M20683</link>
      <description>&lt;P&gt;Cleansing email addresses is nasty business and you're starting here with a rather difficult task.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;You don't have to use a lot of "SAS" though to get this job done. Perl Regular Expressions (RegEx) are great for pattern matchin and pattern replacement.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;But first just to answer your initial question:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data have;
  input email_address :$40.;
  email_address_want=email_address;

  /* Remove all non-alphabetic characters before the first alphabetic character in the email string */
  email_address_want=prxchange('s/^\s*[[:^alpha:]]+//oi',1,email_address_want);

  /*  Remove all characters after ".com" */
  email_address_want=prxchange('s/(?&amp;lt;=(\.com)).+$//oi',1,email_address_want);

  /* Alternative: Remove all non-alphanumeric characters at end of the email address */
/*  email_address_want=prxchange('s/[[:^alnum:]]+\s*$//oi',1,email_address_want);*/
datalines;
..johndoe@gmail.com
johndoe@gmail.com..
johndoe@gmail.com.au&amp;amp;%$.
*&amp;amp;^%$5467johndoe@gmail.org&amp;amp;%$.
;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;SAS implemented RegEx and it's available via SAS functions and call routines starting with PRX...&lt;/P&gt;
&lt;P&gt;&lt;A href="https://support.sas.com/documentation/cdl/en/lefunctionsref/67960/HTML/default/viewer.htm#p0w6napahk6x0an0z2dzozh2ouzm.htm" target="_blank"&gt;https://support.sas.com/documentation/cdl/en/lefunctionsref/67960/HTML/default/viewer.htm#p0w6napahk6x0an0z2dzozh2ouzm.htm&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;&lt;A href="http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a003288497.htm" target="_blank"&gt;http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a003288497.htm&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;&lt;A href="https://support.sas.com/rnd/base/datastep/perl_regexp/regexp-tip-sheet.pdf" target="_blank"&gt;https://support.sas.com/rnd/base/datastep/perl_regexp/regexp-tip-sheet.pdf&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;RegEx are widely used and email verification is a common application for RegEx. You will find heaps of examples on the Internet.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;What I would do is to first implement a RegEx for email verification - so a RegEx which defines the pattern for a valid email address (search the Internet for such a RegEx as starting point).&lt;/P&gt;
&lt;P&gt;Then create a "bad' file with all email addresses not conforming to this pattern and start developing and testing your RegEx for cleansing.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;As others already pointed out the actual RFC standard for valid email addresses is astonishingly wide. Depending on where your data comes from (like anything from the Internet or just a set of company email &amp;nbsp;addresses) you will want to narrow down the pattern for valid email addresses to suit your data.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;If you don't know RegEx syntax: It's a bit tidious but very worth learning.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Good luck!&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sat, 15 Oct 2016 01:45:50 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Enterprise-Guide/SAS-EG-to-Clean-Email-Addresses/m-p/304825#M20683</guid>
      <dc:creator>Patrick</dc:creator>
      <dc:date>2016-10-15T01:45:50Z</dc:date>
    </item>
    <item>
      <title>Re: SAS EG to Clean Email Addresses</title>
      <link>https://communities.sas.com/t5/SAS-Enterprise-Guide/SAS-EG-to-Clean-Email-Addresses/m-p/304868#M20684</link>
      <description>&lt;P&gt;Just to get you started, I pulled a simple Regex example off of the internet, from&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;A href="http://www.regular-expressions.info/email.html" target="_self"&gt;http://www.regular-expressions.info/email.html&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Here's a little piece of code that will use the example regex to divide the addresses into good or bad.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;BTW, I&amp;nbsp;can confirm&amp;nbsp;what every has said about &lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/13884"&gt;@ballardw&lt;/a&gt;. What was said was only in the spirit of being helpful; any other interpretation is due to the nature of the conversation.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Tom&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;FONT color="#000080" face="Courier New" size="2"&gt;&lt;STRONG&gt;data&lt;/STRONG&gt;&lt;/FONT&gt;&lt;FONT face="Courier New" size="2"&gt; Good Bad;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT color="#0000ff" face="Courier New" size="2"&gt;&lt;FONT color="#0000ff" face="Courier New" size="2"&gt;&lt;FONT color="#0000ff" face="Courier New" size="2"&gt;if&lt;/FONT&gt;&lt;/FONT&gt;&lt;/FONT&gt;&lt;FONT face="Courier New" size="2"&gt;&lt;FONT face="Courier New" size="2"&gt; _n_ = &lt;/FONT&gt;&lt;/FONT&gt;&lt;STRONG&gt;&lt;FONT color="#008080" face="Courier New" size="2"&gt;&lt;FONT color="#008080" face="Courier New" size="2"&gt;&lt;FONT color="#008080" face="Courier New" size="2"&gt;1&lt;/FONT&gt;&lt;/FONT&gt;&lt;/FONT&gt;&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT color="#0000ff" face="Courier New" size="2"&gt;&lt;FONT color="#0000ff" face="Courier New" size="2"&gt;&lt;FONT color="#0000ff" face="Courier New" size="2"&gt;then&lt;/FONT&gt;&lt;/FONT&gt;&lt;/FONT&gt; &lt;FONT color="#0000ff" face="Courier New" size="2"&gt;&lt;FONT color="#0000ff" face="Courier New" size="2"&gt;&lt;FONT color="#0000ff" face="Courier New" size="2"&gt;do&lt;/FONT&gt;&lt;/FONT&gt;&lt;/FONT&gt;&lt;FONT face="Courier New" size="2"&gt;&lt;FONT face="Courier New" size="2"&gt;;&lt;/FONT&gt;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT color="#0000ff" face="Courier New" size="2"&gt;&lt;FONT color="#0000ff" face="Courier New" size="2"&gt;&lt;FONT color="#0000ff" face="Courier New" size="2"&gt;retain&lt;/FONT&gt;&lt;/FONT&gt;&lt;/FONT&gt;&lt;FONT face="Courier New" size="2"&gt;&lt;FONT face="Courier New" size="2"&gt; PRX1;&lt;/FONT&gt;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="Courier New" size="2"&gt;&lt;FONT face="Courier New" size="2"&gt; PRX1 = prxparse(&lt;/FONT&gt;&lt;/FONT&gt;&lt;FONT color="#800080" face="Courier New" size="2"&gt;&lt;FONT color="#800080" face="Courier New" size="2"&gt;&lt;FONT color="#800080" face="Courier New" size="2"&gt;"/^[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,}\s*$/i"&lt;/FONT&gt;&lt;/FONT&gt;&lt;/FONT&gt;&lt;FONT face="Courier New" size="2"&gt;&lt;FONT face="Courier New" size="2"&gt;);&lt;/FONT&gt;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT color="#0000ff" face="Courier New" size="2"&gt;&lt;FONT color="#0000ff" face="Courier New" size="2"&gt;&lt;FONT color="#0000ff" face="Courier New" size="2"&gt;drop&lt;/FONT&gt;&lt;/FONT&gt;&lt;/FONT&gt;&lt;FONT face="Courier New" size="2"&gt;&lt;FONT face="Courier New" size="2"&gt; PRX1;&lt;/FONT&gt;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT color="#0000ff" face="Courier New" size="2"&gt;&lt;FONT color="#0000ff" face="Courier New" size="2"&gt;&lt;FONT color="#0000ff" face="Courier New" size="2"&gt;end&lt;/FONT&gt;&lt;/FONT&gt;&lt;/FONT&gt;&lt;FONT face="Courier New" size="2"&gt;&lt;FONT face="Courier New" size="2"&gt;;&lt;/FONT&gt;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT color="#0000ff" face="Courier New" size="2"&gt;&lt;FONT color="#0000ff" face="Courier New" size="2"&gt;&lt;FONT color="#0000ff" face="Courier New" size="2"&gt;set&lt;/FONT&gt;&lt;/FONT&gt;&lt;/FONT&gt;&lt;FONT face="Courier New" size="2"&gt;&lt;FONT face="Courier New" size="2"&gt; Have;&lt;/FONT&gt;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT color="#0000ff" face="Courier New" size="2"&gt;&lt;FONT color="#0000ff" face="Courier New" size="2"&gt;&lt;FONT color="#0000ff" face="Courier New" size="2"&gt;if&lt;/FONT&gt;&lt;/FONT&gt;&lt;/FONT&gt;&lt;FONT face="Courier New" size="2"&gt;&lt;FONT face="Courier New" size="2"&gt; prxmatch(PRX1, TestString) &lt;/FONT&gt;&lt;/FONT&gt;&lt;FONT color="#0000ff" face="Courier New" size="2"&gt;&lt;FONT color="#0000ff" face="Courier New" size="2"&gt;&lt;FONT color="#0000ff" face="Courier New" size="2"&gt;then&lt;/FONT&gt;&lt;/FONT&gt;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT color="#0000ff" face="Courier New" size="2"&gt;&lt;FONT color="#0000ff" face="Courier New" size="2"&gt;&lt;FONT color="#0000ff" face="Courier New" size="2"&gt;output&lt;/FONT&gt;&lt;/FONT&gt;&lt;/FONT&gt;&lt;FONT face="Courier New" size="2"&gt;&lt;FONT face="Courier New" size="2"&gt; Good;&lt;/FONT&gt;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT color="#0000ff" face="Courier New" size="2"&gt;&lt;FONT color="#0000ff" face="Courier New" size="2"&gt;&lt;FONT color="#0000ff" face="Courier New" size="2"&gt;else&lt;/FONT&gt;&lt;/FONT&gt;&lt;/FONT&gt; &lt;FONT color="#0000ff" face="Courier New" size="2"&gt;&lt;FONT color="#0000ff" face="Courier New" size="2"&gt;&lt;FONT color="#0000ff" face="Courier New" size="2"&gt;output&lt;/FONT&gt;&lt;/FONT&gt;&lt;/FONT&gt;&lt;FONT face="Courier New" size="2"&gt;&lt;FONT face="Courier New" size="2"&gt; Bad;&lt;/FONT&gt;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT color="#000080" face="Courier New" size="2"&gt;&lt;FONT color="#000080" face="Courier New" size="2"&gt;&lt;FONT color="#000080" face="Courier New" size="2"&gt;&lt;STRONG&gt;run&lt;/STRONG&gt;&lt;/FONT&gt;&lt;/FONT&gt;&lt;/FONT&gt;&lt;FONT face="Courier New" size="2"&gt;&lt;FONT face="Courier New" size="2"&gt;;&lt;/FONT&gt;&lt;/FONT&gt;&lt;/P&gt;</description>
      <pubDate>Sat, 15 Oct 2016 13:29:49 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Enterprise-Guide/SAS-EG-to-Clean-Email-Addresses/m-p/304868#M20684</guid>
      <dc:creator>TomKari</dc:creator>
      <dc:date>2016-10-15T13:29:49Z</dc:date>
    </item>
    <item>
      <title>Re: SAS EG to Clean Email Addresses</title>
      <link>https://communities.sas.com/t5/SAS-Enterprise-Guide/SAS-EG-to-Clean-Email-Addresses/m-p/305052#M20687</link>
      <description>&lt;P&gt;Ohhh yes, I'm very aware of the billion different ways a person can enter in info...back in the day when all I DID was scrub data...how many times to you think the word "feline" can be messed up?&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;feline&lt;/P&gt;&lt;P&gt;feelin&lt;/P&gt;&lt;P&gt;feeline&lt;/P&gt;&lt;P&gt;feilin&lt;/P&gt;&lt;P&gt;filin&lt;/P&gt;&lt;P&gt;feillin&lt;/P&gt;&lt;P&gt;...&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Yeah...that was always fun... &lt;span class="lia-unicode-emoji" title=":neutral_face:"&gt;😐&lt;/span&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 17 Oct 2016 11:42:03 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Enterprise-Guide/SAS-EG-to-Clean-Email-Addresses/m-p/305052#M20687</guid>
      <dc:creator>tessa_h</dc:creator>
      <dc:date>2016-10-17T11:42:03Z</dc:date>
    </item>
    <item>
      <title>Re: SAS EG to Clean Email Addresses</title>
      <link>https://communities.sas.com/t5/SAS-Enterprise-Guide/SAS-EG-to-Clean-Email-Addresses/m-p/305073#M20688</link>
      <description>&lt;P&gt;Hi All:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;First of all- THANK YOU for speaking up! All of this info is helpful in pointing me in the right direction. It also helps to know that what I am trying to accomplish is a big job. I can't tell you how much it helps to be validated around my concerns.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I can see that there are many ways to skin this cat (sorry for the grotesque pun...but it's a grotesque job!)...and figuring out the appropriate method is going to be a task in it of itself.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I've reached out to some collegues for assistance- turns out that we have a process in place for validating emails; I will be meeting with the gentleman who has established this process. I hope to share the information you all have provided me to see how his methodology compares. I'd like to use coding from you-all and him...and build it right into my workflow. I will let you know what his response and method is.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks again- and my apologies if I have previously offended anyone; like I had mentioned before, sometimes finding help on forums can be challenging due to communication barriers and egos. &lt;span class="lia-unicode-emoji" title=":face_with_tongue:"&gt;😛&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;-Tessa&lt;/P&gt;</description>
      <pubDate>Mon, 17 Oct 2016 13:11:29 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Enterprise-Guide/SAS-EG-to-Clean-Email-Addresses/m-p/305073#M20688</guid>
      <dc:creator>tessa_h</dc:creator>
      <dc:date>2016-10-17T13:11:29Z</dc:date>
    </item>
    <item>
      <title>Re: SAS EG to Clean Email Addresses</title>
      <link>https://communities.sas.com/t5/SAS-Enterprise-Guide/SAS-EG-to-Clean-Email-Addresses/m-p/305745#M20721</link>
      <description>&lt;P&gt;Thought it is worth mentioning that SAS's Dataflux product contains functionality for cleaning email addresses along with cleaning all sorts of other data besides. This interesting paper explains more:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;A href="http://support.sas.com/resources/papers/proceedings15/SAS1852-2015.pdf" target="_blank"&gt;http://support.sas.com/resources/papers/proceedings15/SAS1852-2015.pdf&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 19 Oct 2016 19:30:11 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Enterprise-Guide/SAS-EG-to-Clean-Email-Addresses/m-p/305745#M20721</guid>
      <dc:creator>SASKiwi</dc:creator>
      <dc:date>2016-10-19T19:30:11Z</dc:date>
    </item>
  </channel>
</rss>

