<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Regular Expressions/PRX Patterns in SAS Procedures</title>
    <link>https://communities.sas.com/t5/SAS-Procedures/Regular-Expressions-PRX-Patterns/m-p/152849#M40209</link>
    <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Can you assume that data within one data set is "clean"? Just something to think about. &lt;/P&gt;&lt;P&gt;I like Fried Egg's response from this thread a while back. &lt;/P&gt;&lt;P&gt;&lt;A __default_attr="36713" __jive_macro_name="thread" class="jive_macro jive_macro_thread" href="https://communities.sas.com/"&gt;&lt;/A&gt;&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
    <pubDate>Wed, 17 Dec 2014 01:26:13 GMT</pubDate>
    <dc:creator>Reeza</dc:creator>
    <dc:date>2014-12-17T01:26:13Z</dc:date>
    <item>
      <title>Regular Expressions/PRX Patterns</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/Regular-Expressions-PRX-Patterns/m-p/152847#M40207</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;I am new to regular expressions in SAS and&amp;nbsp; trying to use them for cleaning a few variables in a dataset. The variables are Company_Name and Address. The requirements for company name are as follows:&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-size: 10pt; line-height: 1.5em;"&gt;-Should not have special characters (except hyphen)&lt;/SPAN&gt;&lt;SPAN style="font-size: 10pt; line-height: 1.5em;"&gt; &lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-size: 10pt; line-height: 1.5em;"&gt;-Should not have country names embedded in it e.g. Hyundai USA Inc.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-size: 10pt; line-height: 1.5em;"&gt;-Should not have any irrelevant text like 'UNKNOWN', 'TBD' etc.---This I can add to the code as per requirements&lt;BR /&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-size: 10pt; line-height: 1.5em;"&gt;-Should not begin with a number&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-size: 10pt; line-height: 1.5em;"&gt;-Should be allowed to include words like LTD, Corp, Inc. and so on.&lt;BR /&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-size: 10pt; line-height: 1.5em;"&gt;Ideally the cleaned data should look like "Test Company Inc." or "Test Company Corp".&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-size: 10pt; line-height: 1.5em;"&gt;&lt;BR /&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-size: 10pt; line-height: 1.5em;"&gt;Need to work on similar lines for addresses too.&lt;BR /&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-size: 10pt; line-height: 1.5em;"&gt;&lt;BR /&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-size: 10pt; line-height: 1.5em;"&gt;I know this is too much to ask for but I would really appreciate it if I can get some help with the coding part of this. Thank you&lt;/SPAN&gt;&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Tue, 16 Dec 2014 22:57:19 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/Regular-Expressions-PRX-Patterns/m-p/152847#M40207</guid>
      <dc:creator>sasmaverick</dc:creator>
      <dc:date>2014-12-16T22:57:19Z</dc:date>
    </item>
    <item>
      <title>Re: Regular Expressions/PRX Patterns</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/Regular-Expressions-PRX-Patterns/m-p/152848#M40208</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Are you looking to show possible problem values or automagically edit them?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;And what if the country name is part of the company name such as Air France?&lt;/P&gt;&lt;P&gt;And a company such as 3M is treated how?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Some ideas in: &lt;A _jive_internal="true" href="https://communities.sas.com/thread/63773"&gt;String Grouping &amp;amp; Matching&lt;/A&gt;&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Wed, 17 Dec 2014 00:50:49 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/Regular-Expressions-PRX-Patterns/m-p/152848#M40208</guid>
      <dc:creator>ballardw</dc:creator>
      <dc:date>2014-12-17T00:50:49Z</dc:date>
    </item>
    <item>
      <title>Re: Regular Expressions/PRX Patterns</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/Regular-Expressions-PRX-Patterns/m-p/152849#M40209</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Can you assume that data within one data set is "clean"? Just something to think about. &lt;/P&gt;&lt;P&gt;I like Fried Egg's response from this thread a while back. &lt;/P&gt;&lt;P&gt;&lt;A __default_attr="36713" __jive_macro_name="thread" class="jive_macro jive_macro_thread" href="https://communities.sas.com/"&gt;&lt;/A&gt;&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Wed, 17 Dec 2014 01:26:13 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/Regular-Expressions-PRX-Patterns/m-p/152849#M40209</guid>
      <dc:creator>Reeza</dc:creator>
      <dc:date>2014-12-17T01:26:13Z</dc:date>
    </item>
    <item>
      <title>Re: Regular Expressions/PRX Patterns</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/Regular-Expressions-PRX-Patterns/m-p/152850#M40210</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Below code won't solve everything for you but hopefully will give you at least some inspiration.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;data have;&lt;/P&gt;&lt;P&gt;&amp;nbsp; infile datalines truncover;&lt;/P&gt;&lt;P&gt;&amp;nbsp; input Company_Name $50.;&lt;/P&gt;&lt;P&gt;&amp;nbsp; datalines;&lt;/P&gt;&lt;P&gt;Hyundai USA Inc.&lt;/P&gt;&lt;P&gt;Hyundai USA Inc.&lt;/P&gt;&lt;P&gt;Hyundai-Inc.&lt;/P&gt;&lt;P&gt;Hyundai_Inc.&lt;/P&gt;&lt;P&gt;UNKNOWN Inc.&lt;/P&gt;&lt;P&gt;TBD&lt;/P&gt;&lt;P&gt;;&lt;/P&gt;&lt;P&gt;run;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;data lookup;&lt;/P&gt;&lt;P&gt;&amp;nbsp; infile datalines truncover;&lt;/P&gt;&lt;P&gt;&amp;nbsp; input word $30.;&lt;/P&gt;&lt;P&gt;&amp;nbsp; datalines;&lt;/P&gt;&lt;P&gt;UNKNOWN&lt;/P&gt;&lt;P&gt;TBD&lt;/P&gt;&lt;P&gt;XXX&lt;/P&gt;&lt;P&gt;USA&lt;/P&gt;&lt;P&gt;;&lt;/P&gt;&lt;P&gt;run;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;data want;&lt;/P&gt;&lt;P&gt;&amp;nbsp; set have;&lt;/P&gt;&lt;P&gt;&amp;nbsp; length Company_Name_Cleansed $50.;&lt;/P&gt;&lt;P&gt;&amp;nbsp; Company_Name_Cleansed=Company_Name;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp; /** replace all special characters with a blank **/&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;&amp;nbsp; /** &lt;/SPAN&gt;&lt;A class="jive-link-external-small" href="http://support.sas.com/documentation/cdl/en/lefunctionsref/67398/HTML/default/viewer.htm#p0s9ilagexmjl8n1u7e1t1jfnzlk.htm"&gt;http://support.sas.com/documentation/cdl/en/lefunctionsref/67398/HTML/default/viewer.htm#p0s9ilagexmjl8n1u7e1t1jfnzlk.htm&lt;/A&gt;&lt;SPAN&gt; **/&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp; Company_Name_Cleansed=prxchange('s/[^\.-[:alnum:]]/ /oi',-1,Company_Name_Cleansed);&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp; /** set all names with unwanted words to missing **/&lt;/P&gt;&lt;P&gt;&amp;nbsp; if _n_=1 then &lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; do;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; if 0 then set lookup;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; dcl hash h1(dataset:'lookup');&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; _rc=h1.defineKey('word');&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; _rc=h1.defineData('word');&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; _rc=h1.defineDone();&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; dcl hiter hit1('h1');&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; end;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; _rc = hit1.first();&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; do while (_rc = 0);&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; if findw(Company_Name_Cleansed,strip(word),' ','i')&amp;gt;0 then&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; do;&lt;/P&gt;&lt;P&gt;/*&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; call missing(Company_Name_Cleansed);*/&lt;/P&gt;&lt;P&gt;/*&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; leave;*/&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Company_Name_Cleansed=transtrn(Company_Name_Cleansed,strip(word),'');&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; end;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; _rc = hit1.next();&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; end;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; Company_Name_Cleansed=compbl(left(Company_Name_Cleansed));&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;run;&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Wed, 17 Dec 2014 01:38:07 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/Regular-Expressions-PRX-Patterns/m-p/152850#M40210</guid>
      <dc:creator>Patrick</dc:creator>
      <dc:date>2014-12-17T01:38:07Z</dc:date>
    </item>
    <item>
      <title>Re: Regular Expressions/PRX Patterns</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/Regular-Expressions-PRX-Patterns/m-p/152851#M40211</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Hi Patrick,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thanks for the help. Your logic is really helping me a lot. Just wondering how do I match addresses ending with digits using PRX patterns. Example. I want to flag everything ENDING with digits like "Napa Valley Street 10", "Rodeo Drive Suite 212, "Hwy 500".&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thanks again for your time.&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Fri, 19 Dec 2014 06:32:33 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/Regular-Expressions-PRX-Patterns/m-p/152851#M40211</guid>
      <dc:creator>sasmaverick</dc:creator>
      <dc:date>2014-12-19T06:32:33Z</dc:date>
    </item>
    <item>
      <title>Re: Regular Expressions/PRX Patterns</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/Regular-Expressions-PRX-Patterns/m-p/152852#M40212</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Something like below could do.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;data test;&lt;/P&gt;&lt;P&gt;&amp;nbsp; infile datalines truncover;&lt;/P&gt;&lt;P&gt;&amp;nbsp; input string $100.;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp; if prxmatch('/\d+ *$/oi',string)&amp;gt; 1 then &lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; do;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; end_num_flg=1;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; end;&lt;/P&gt;&lt;P&gt;&amp;nbsp; else end_num_flg=0;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;datalines;&lt;/P&gt;&lt;P&gt;Napa Valley Street 10&lt;/P&gt;&lt;P&gt;Rodeo Drive Suite 212&lt;/P&gt;&lt;P&gt;Hwy 500&lt;/P&gt;&lt;P&gt;Rodeo Drive 10 Suite&lt;/P&gt;&lt;P&gt;25 Hwy&lt;/P&gt;&lt;P&gt;some other street &lt;/P&gt;&lt;P&gt;;&lt;/P&gt;&lt;P&gt;run;&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Fri, 19 Dec 2014 07:15:34 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/Regular-Expressions-PRX-Patterns/m-p/152852#M40212</guid>
      <dc:creator>Patrick</dc:creator>
      <dc:date>2014-12-19T07:15:34Z</dc:date>
    </item>
  </channel>
</rss>

